#
eb524d0f |
|
06-Dec-2023 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, expose eswitch manager vport Expose the ability the query the eswitch manager vport number. Next patch will utilize this capability to reveal the correct register C0 value to the users. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/614fb0e216250e2ce3340471ec141b83ec45c7f4.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
04ad04e4 |
|
06-Oct-2023 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Refactor mlx5_flow_destination->rep pointer to vport num Currently the destination rep pointer is only used for comparisons or to obtain vport number from it. Since it is used both during flow creation and deletion it may point to representor of another eswitch instance which can be deallocated during driver unload even when there are rules pointing to it[0]. Refactor the code to store vport number and 'valid' flag instead of the representor pointer. [0]: [176805.886303] ================================================================== [176805.889433] BUG: KASAN: slab-use-after-free in esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.892981] Read of size 2 at addr ffff888155090aa0 by task modprobe/27280 [176805.895462] CPU: 3 PID: 27280 Comm: modprobe Tainted: G B 6.6.0-rc3+ #1 [176805.896771] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [176805.898514] Call Trace: [176805.899026] <TASK> [176805.899519] dump_stack_lvl+0x33/0x50 [176805.900221] print_report+0xc2/0x610 [176805.900893] ? mlx5_chains_put_table+0x33d/0x8d0 [mlx5_core] [176805.901897] ? esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.902852] kasan_report+0xac/0xe0 [176805.903509] ? esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.904461] esw_cleanup_dests+0x390/0x440 [mlx5_core] [176805.905223] __mlx5_eswitch_del_rule+0x1ae/0x460 [mlx5_core] [176805.906044] ? esw_cleanup_dests+0x440/0x440 [mlx5_core] [176805.906822] ? xas_find_conflict+0x420/0x420 [176805.907496] ? down_read+0x11e/0x200 [176805.908046] mlx5e_tc_rule_unoffload+0xc4/0x2a0 [mlx5_core] [176805.908844] mlx5e_tc_del_fdb_flow+0x7da/0xb10 [mlx5_core] [176805.909597] mlx5e_flow_put+0x4b/0x80 [mlx5_core] [176805.910275] mlx5e_delete_flower+0x5b4/0xb70 [mlx5_core] [176805.911010] tc_setup_cb_reoffload+0x27/0xb0 [176805.911648] fl_reoffload+0x62d/0x900 [cls_flower] [176805.912313] ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core] [176805.913151] ? __fl_put+0x230/0x230 [cls_flower] [176805.913768] ? filter_irq_stacks+0x90/0x90 [176805.914335] ? kasan_save_stack+0x1e/0x40 [176805.914893] ? kasan_set_track+0x21/0x30 [176805.915484] ? kasan_save_free_info+0x27/0x40 [176805.916105] tcf_block_playback_offloads+0x79/0x1f0 [176805.916773] ? mlx5e_rep_indr_block_unbind+0xd0/0xd0 [mlx5_core] [176805.917647] tcf_block_unbind+0x12d/0x330 [176805.918239] tcf_block_offload_cmd.isra.0+0x24e/0x320 [176805.918953] ? tcf_block_bind+0x770/0x770 [176805.919551] ? _raw_read_unlock_irqrestore+0x30/0x30 [176805.920236] ? mutex_lock+0x7d/0xd0 [176805.920735] ? mutex_unlock+0x80/0xd0 [176805.921255] tcf_block_offload_unbind+0xa5/0x120 [176805.921909] __tcf_block_put+0xc2/0x2d0 [176805.922467] ingress_destroy+0xf4/0x3d0 [sch_ingress] [176805.923178] __qdisc_destroy+0x9d/0x280 [176805.923741] dev_shutdown+0x1c6/0x330 [176805.924295] unregister_netdevice_many_notify+0x6ef/0x1500 [176805.925034] ? netdev_freemem+0x50/0x50 [176805.925610] ? _raw_spin_lock_irq+0x7b/0xd0 [176805.926235] ? _raw_spin_lock_bh+0xe0/0xe0 [176805.926849] unregister_netdevice_queue+0x1e0/0x280 [176805.927592] ? unregister_netdevice_many+0x10/0x10 [176805.928275] unregister_netdev+0x18/0x20 [176805.928835] mlx5e_vport_rep_unload+0xc0/0x200 [mlx5_core] [176805.929608] mlx5_esw_offloads_unload_rep+0x9d/0xc0 [mlx5_core] [176805.930492] mlx5_eswitch_unload_vf_vports+0x108/0x1a0 [mlx5_core] [176805.931422] ? mlx5_eswitch_unload_sf_vport+0x50/0x50 [mlx5_core] [176805.932304] ? rwsem_down_write_slowpath+0x11f0/0x11f0 [176805.932987] mlx5_eswitch_disable_sriov+0x6f9/0xa60 [mlx5_core] [176805.933807] ? mlx5_core_disable_hca+0xe1/0x130 [mlx5_core] [176805.934576] ? mlx5_eswitch_disable_locked+0x580/0x580 [mlx5_core] [176805.935463] mlx5_device_disable_sriov+0x138/0x490 [mlx5_core] [176805.936308] mlx5_sriov_disable+0x8c/0xb0 [mlx5_core] [176805.937063] remove_one+0x7f/0x210 [mlx5_core] [176805.937711] pci_device_remove+0x96/0x1c0 [176805.938289] device_release_driver_internal+0x361/0x520 [176805.938981] ? kobject_put+0x5c/0x330 [176805.939553] driver_detach+0xd7/0x1d0 [176805.940101] bus_remove_driver+0x11f/0x290 [176805.943847] pci_unregister_driver+0x23/0x1f0 [176805.944505] mlx5_cleanup+0xc/0x20 [mlx5_core] [176805.945189] __x64_sys_delete_module+0x2b3/0x450 [176805.945837] ? module_flags+0x300/0x300 [176805.946377] ? dput+0xc2/0x830 [176805.946848] ? __kasan_record_aux_stack+0x9c/0xb0 [176805.947555] ? __call_rcu_common.constprop.0+0x46c/0xb50 [176805.948338] ? fpregs_assert_state_consistent+0x1d/0xa0 [176805.949055] ? exit_to_user_mode_prepare+0x30/0x120 [176805.949713] do_syscall_64+0x3d/0x90 [176805.950226] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176805.950904] RIP: 0033:0x7f7f42c3f5ab [176805.951462] Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 [176805.953710] RSP: 002b:00007fff07dc9d08 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 [176805.954691] RAX: ffffffffffffffda RBX: 000055b6e91c01e0 RCX: 00007f7f42c3f5ab [176805.955691] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b6e91c0248 [176805.956662] RBP: 000055b6e91c01e0 R08: 0000000000000000 R09: 0000000000000000 [176805.957601] R10: 00007f7f42d9eac0 R11: 0000000000000206 R12: 000055b6e91c0248 [176805.958593] R13: 0000000000000000 R14: 000055b6e91bfb38 R15: 0000000000000000 [176805.959599] </TASK> [176805.960324] Allocated by task 20490: [176805.960893] kasan_save_stack+0x1e/0x40 [176805.961463] kasan_set_track+0x21/0x30 [176805.962019] __kasan_kmalloc+0x77/0x90 [176805.962554] esw_offloads_init+0x1bb/0x480 [mlx5_core] [176805.963318] mlx5_eswitch_init+0xc70/0x15c0 [mlx5_core] [176805.964092] mlx5_init_one_devl_locked+0x366/0x1230 [mlx5_core] [176805.964902] probe_one+0x6f7/0xc90 [mlx5_core] [176805.965541] local_pci_probe+0xd7/0x180 [176805.966075] pci_device_probe+0x231/0x6f0 [176805.966631] really_probe+0x1d4/0xb50 [176805.967179] __driver_probe_device+0x18d/0x450 [176805.967810] driver_probe_device+0x49/0x120 [176805.968431] __driver_attach+0x1fb/0x490 [176805.968976] bus_for_each_dev+0xed/0x170 [176805.969560] bus_add_driver+0x21a/0x570 [176805.970124] driver_register+0x133/0x460 [176805.970684] 0xffffffffa0678065 [176805.971180] do_one_initcall+0x92/0x2b0 [176805.971744] do_init_module+0x22d/0x720 [176805.972318] load_module+0x58c3/0x63b0 [176805.972847] init_module_from_file+0xd2/0x130 [176805.973441] __x64_sys_finit_module+0x389/0x7c0 [176805.974045] do_syscall_64+0x3d/0x90 [176805.974556] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176805.975566] Freed by task 27280: [176805.976077] kasan_save_stack+0x1e/0x40 [176805.976655] kasan_set_track+0x21/0x30 [176805.977221] kasan_save_free_info+0x27/0x40 [176805.977834] ____kasan_slab_free+0x11a/0x1b0 [176805.978505] __kmem_cache_free+0x163/0x2d0 [176805.979113] esw_offloads_cleanup_reps+0xb8/0x120 [mlx5_core] [176805.979963] mlx5_eswitch_cleanup+0x182/0x270 [mlx5_core] [176805.980763] mlx5_cleanup_once+0x9a/0x1e0 [mlx5_core] [176805.981477] mlx5_uninit_one+0xa9/0x180 [mlx5_core] [176805.982196] remove_one+0x8f/0x210 [mlx5_core] [176805.982868] pci_device_remove+0x96/0x1c0 [176805.983461] device_release_driver_internal+0x361/0x520 [176805.984169] driver_detach+0xd7/0x1d0 [176805.984702] bus_remove_driver+0x11f/0x290 [176805.985261] pci_unregister_driver+0x23/0x1f0 [176805.985847] mlx5_cleanup+0xc/0x20 [mlx5_core] [176805.986483] __x64_sys_delete_module+0x2b3/0x450 [176805.987126] do_syscall_64+0x3d/0x90 [176805.987665] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176805.988667] Last potentially related work creation: [176805.989305] kasan_save_stack+0x1e/0x40 [176805.989839] __kasan_record_aux_stack+0x9c/0xb0 [176805.990443] kvfree_call_rcu+0x84/0xa30 [176805.990973] clean_xps_maps+0x265/0x6e0 [176805.991547] netif_reset_xps_queues.part.0+0x3f/0x80 [176805.992226] unregister_netdevice_many_notify+0xfcf/0x1500 [176805.992966] unregister_netdevice_queue+0x1e0/0x280 [176805.993638] unregister_netdev+0x18/0x20 [176805.994205] mlx5e_remove+0xba/0x1e0 [mlx5_core] [176805.994872] auxiliary_bus_remove+0x52/0x70 [176805.995490] device_release_driver_internal+0x361/0x520 [176805.996196] bus_remove_device+0x1e1/0x3d0 [176805.996767] device_del+0x390/0x980 [176805.997270] mlx5_rescan_drivers_locked.part.0+0x130/0x540 [mlx5_core] [176805.998195] mlx5_unregister_device+0x77/0xc0 [mlx5_core] [176805.998989] mlx5_uninit_one+0x41/0x180 [mlx5_core] [176805.999719] remove_one+0x8f/0x210 [mlx5_core] [176806.000387] pci_device_remove+0x96/0x1c0 [176806.000938] device_release_driver_internal+0x361/0x520 [176806.001612] unbind_store+0xd8/0xf0 [176806.002108] kernfs_fop_write_iter+0x2c0/0x440 [176806.002748] vfs_write+0x725/0xba0 [176806.003294] ksys_write+0xed/0x1c0 [176806.003823] do_syscall_64+0x3d/0x90 [176806.004357] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [176806.005317] The buggy address belongs to the object at ffff888155090a80 which belongs to the cache kmalloc-64 of size 64 [176806.006774] The buggy address is located 32 bytes inside of freed 64-byte region [ffff888155090a80, ffff888155090ac0) [176806.008773] The buggy address belongs to the physical page: [176806.009480] page:00000000a407e0e6 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x155090 [176806.010633] flags: 0x200000000000800(slab|node=0|zone=2) [176806.011352] page_type: 0xffffffff() [176806.011905] raw: 0200000000000800 ffff888100042640 ffffea000422b1c0 dead000000000004 [176806.012949] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [176806.013933] page dumped because: kasan: bad access detected [176806.014935] Memory state around the buggy address: [176806.015601] ffff888155090980: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.016568] ffff888155090a00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.017497] >ffff888155090a80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.018438] ^ [176806.019007] ffff888155090b00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.020001] ffff888155090b80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [176806.020996] ================================================================== Fixes: a508728a4c8b ("net/mlx5e: VF tunnel RX traffic offloading") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
baac8351 |
|
10-Oct-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Reduce eswitch mode_lock protection context Currently eswitch mode_lock is so heavy, for example, it's locked during the whole process of the mode change, which may need to hold other locks. As the mode_lock is also used by IPSec to block mode and encap change now, it is easy to cause lock dependency. Since some of protections are also done by devlink lock, the eswitch mode_lock is not needed at those places, and thus the possibility of lockdep issue is reduced. Fixes: c8e350e62fc5 ("net/mlx5e: Make TC and IPsec offloads mutually exclusive on a netdev") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
#
b691b111 |
|
25-Aug-2023 |
Dima Chumak <dchumak@nvidia.com> |
net/mlx5: Implement devlink port function cmds to control ipsec_packet Implement devlink port function commands to enable / disable IPsec packet offloads. This is used to control the IPsec capability of the device. When ipsec_offload is enabled for a VF, it prevents adding IPsec packet offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec packet offloads on the PF, it's not allowed to enable ipsec_packet on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
06bab696 |
|
25-Aug-2023 |
Dima Chumak <dchumak@nvidia.com> |
net/mlx5: Implement devlink port function cmds to control ipsec_crypto Implement devlink port function commands to enable / disable IPsec crypto offloads. This is used to control the IPsec capability of the device. When ipsec_crypto is enabled for a VF, it prevents adding IPsec crypto offloads on the PF, because the two cannot be active simultaneously due to HW constraints. Conversely, if there are any active IPsec crypto offloads on the PF, it's not allowed to enable ipsec_crypto on a VF, until PF IPsec offloads are cleared. Signed-off-by: Dima Chumak <dchumak@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8efd7b17 |
|
25-Aug-2023 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Provide an interface to block change of IPsec capabilities mlx5 HW can't perform IPsec offload operation simultaneously both on PF and VFs at the same time. While the previous patches added devlink knobs to change IPsec capabilities dynamically, there is a need to add a logic to block such IPsec capabilities for the cases when IPsec is already configured. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
e2537341 |
|
25-Aug-2023 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5e: Rewrite IPsec vs. TC block interface In the commit 366e46242b8e ("net/mlx5e: Make IPsec offload work together with eswitch and TC"), new API to block IPsec vs. TC creation was introduced. Internally, that API used devlink lock to avoid races with userspace, but it is not really needed as dev->priv.eswitch is stable and can't be changed. So remove dependency on devlink lock and move block encap code back to its original place. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7d833520 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Store vport in struct mlx5_devlink_port and use it in port ops Instead of using internal devlink_port->index to perform vport lookup in every devlink port op, store the vport pointer to the container struct mlx5_devlink_port and use it directly in port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5c632cc3 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Relax mlx5_devlink_eswitch_get() return value checking If called from port ops, it is not needed to perform the checks in mlx5_devlink_eswitch_get(). The reason is devlink port would not be registered if the checks are not true. Introduce relaxed version mlx5_devlink_eswitch_nocheck_get() and use it in port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2caa2a39 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Reduce number of vport lookups passing vport pointer instead of index During devlink port init/cleanup and register/unregister calls, there are many lookups of vport. Instead of passing vport_num as argument to functions, pass the vport struct pointer directly and avoid repeated lookups. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2c5f33f6 |
|
31-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Embed struct devlink_port into driver structure Struct devlink_port is usually embedded in a driver-specific struct which allows to carry driver context to devlink port ops. Introduce a container struct to include devlink_port struct in preparation to also include driver context for devlink port ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
13f878a2 |
|
01-Jun-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Don't register ops for non-PF/VF/SF port and avoid checks in ops Currently each PF/VF/SF devlink port op called into mlx5 code calls is_port_function_supported() to check if the port is either PF, VF or SF. So make sure that the ops are registered with devlink port only for those and avoid the is_port_function_supported() checks in ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b940ec4b |
|
26-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Remove no longer used mlx5_esw_offloads_sf_vport_enable/disable() Since the previous patch removed the only users of these functions, remove them. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e855afd7 |
|
26-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Introduce mlx5_eswitch_load/unload_sf_vport() and use it from SF code Similar to the PF/VF helpers, introduce a set of load/unload helpers for SF vports. From there, call mlx5_eswitch_load/unload_vport() which are common for PFs/VFs and newly introduced SF helpers. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d9833bcf |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Push devlink port PF/VF init/cleanup calls out of devlink_port_register/unregister() In order to prepare for mlx5_esw_offloads_devlink_port_register/unregister() to be used for SFs as well, push out the PF/VF specific init/cleanup calls outside. Introduce mlx5_eswitch_load/unload_pf_vf_vport() and call them from there. Use these new helpers of PF/VF loading and make mlx5_eswitch_local/unload_vport() reusable for SFs. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d1569537 |
|
31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Modify and restore TC rules for IPSec TX rules After IPsec policy/state TX rules are added, any TC flow rule, which forwards packets to uplink, is modified to forward to IPsec TX tables. As these tables are destroyed dynamically, whenever there is no reference to them, the destinations of this kind of rules must be restored to uplink. There is a special case for packet encapsulation, as the packet_reformat_id in the extended destination is used to reformat packets, but only for the VPORT destination. To forward packet to IPsec table and do encapsulation in one FTE, move the packet_reformat_id to flow context, instead of using the extended destination. As a limitation, multiple encapsulations with table forwarding, and one together with other VPORT destinations, are not allowed, so add a check when offloading TC rules. TC rules are not allowed before IPsec TX rule is added, so only need to restore TC rules after flush IPSec TX rules. As they are saved in the vport_rep rhashtables, we walk all the rules in the rhashtables, and find TC rules with destinations pointing to IPsec tables, and modify them one by one. To avoid concurrent issue, this handling is done under the protection of eswitch mode_lock. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/7bcb2c7e2ecf0e0d06b095c8dcc6a37ea7f02faf.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
366e4624 |
|
31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Make IPsec offload work together with eswitch and TC The eswitch mode is not allowed to change if there are any IPsec rules. Besides, by using mlx5_esw_try_lock() to get eswitch mode lock, IPsec rules are not allowed to be offloaded if there are any TC rules. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/e442b512b21a931fbdfb87d57ae428c37badd58a.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
c6c2bf5d |
|
31-Jul-2023 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5e: Support IPsec packet offload for TX in switchdev mode The IPsec encryption is done at the last, so add new prio for IPsec offload in FDB, and put it just lower than the slow path prio and higher than the per-vport prio. Three levels are added for TX. The first one is for ip xfrm policy. The sa table is created in the second level for ip xfrm state. The status table is created at the last to count the number of packets encrypted. The rules, which forward packets to uplink, are changed to forward them to IPsec TX tables first. These rules are restored after those tables are destroyed, which is done immediately when there is no reference to them, just as what does in legacy mode. The support for slow path is added here, by refreshing uplink's channels. But, the handling for TC fast path, which is more complicated, will be added later. Besides, reg c4 is used instead to match reqid. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/cfd0e6ffaf0b8c55ebaa9fb0649b7c504b6b8ec6.1690802064.git.leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
9eca8bb8 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Give esw_offloads_load/unload_rep() "mlx5_" prefix As esw_offloads_load/unload_rep() are used outside eswitch.c it is nicer for them to have "mlx5_" prefix. Add it. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
329980d0 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Make mlx5_eswitch_load/unload_vport() static mlx5_eswitch_load/unload_vport()() functions are not used outside of eswitch.c. Make them static. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b7186387 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
net/mlx5: Make mlx5_esw_offloads_rep_load/unload() static mlx5_esw_offloads_rep_load/unload() functions are not used outside of eswitch_offloads.c. Make them static. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1161d22d |
|
22-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Register devcom device with switch id key Register devcom devices with switch id instead of guid. Devcom interface is used to sync between ports in the eswitch, e.g. Adding miss rules between the ports. New eswitch devices could have the same guid but a different switch id so its more correct to group according to switch id which is the identifier if the ports are on the same eswitch. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
88d162b4 |
|
03-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: Devcom, Infrastructure changes Update devcom infrastructure to be more generic, without depending on max supported ports definition or a device guid, and also more encapsulated so callers don't need to pass the register devcom component id per event call. Signed-off-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
70c36438 |
|
27-May-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: E-Switch, Use xarray for devcom paired device index To allow devcom events on E-Switch that is not a vport group manager, use vhca id as an index instead of device index which might be shared between several E-Switches. for example SF and its PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b3bd6892 |
|
13-Jun-2023 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Fix the macro for accessing EC VF vports The last value is not set correctly. This results in representors not being created for all EC VFs when the base value is higher than 0. Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f405787a |
|
01-Jun-2023 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Create eswitch debugfs root directory Following patch in series uses the new directory for bridge FDB debugfs. The new directory is intended for all future eswitch-specific debugfs files. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a7719b29 |
|
07-Mar-2023 |
Daniel Jurgens <danielj@nvidia.com> |
net/mlx5: Add management of EC VF vports Add init, load, unload, and cleanup of the EC VF vports. This includes changes in how eswitch SRIOV is managed. Previous on an embedded CPU platform the number of VFs provided when enabling the eswitch was always 0, host VFs vports are handled in the eswitch functions change event handler. Now track the number of EC VFs as well, so they can be handled properly in the enable/disable flows. There are only 3 marks available for use in xarrays, all 3 were already in use for this use case. EC VF vports are in a known range so we can access them by index instead of marks. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4c103aea |
|
06-Jun-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: LAG, check if all eswitches are paired for shared FDB Shared FDB LAG can only work if all eswitches are paired. Also, whenever two eswitches are paired, devcom is marked as ready. Therefore, in case of device with two eswitches, checking devcom was sufficient. However, this is not correct for device with more than two eswitches, which will be introduced in downstream patch. Hence, check all eswitches are paired explicitly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
90ca127c |
|
02-Jun-2023 |
Saeed Mahameed <saeedm@nvidia.com> |
net/mlx5: Devcom, introduce devcom_for_each_peer_entry Introduce generic APIs which will retrieve all peers. This API replace mlx5_devcom_get/release_peer_data which retrieve only a single peer. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8611df72 |
|
02-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, mark devcom as not ready when all eswitches are unpaired Whenever an eswitch is unpaired with another, the driver mark devcom as not ready. While this is correct in case we are pairing only two eswitches, in order to support pairing of more than two eswitches, driver need to mark devcom as not ready only when all eswitches are unpaired. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
014e4d48 |
|
02-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, generalize shared FDB creation Shared FDB creation is hard coded for only two eswitches. Generalize shared FDB creation so that any number of eswitches could create shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5e0202eb |
|
22-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, Handle multiple master egress rules Currently, whenever a shared FDB is created, the slave eswitch is creating master egress rule to the master eswitch. In order to support more than two ports, which means there will be more than one slave eswitch, enlarge bounce_rule, which is used to create master egress rule, to an xarray. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9bee385a |
|
05-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, refactor FDB miss rule add/remove Currently, E-switch FDB have a single peer miss rule. In order to support more than one peer, refactor E-switch FDB to have peer miss rule per peer, and change the code to add/remove a rule from specific peer. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9be6c21f |
|
06-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5e: Handle offloads flows per peer Currently, E-switch offloads table have a list of all flows that create a peer_flow over the peer eswitch. In order to support more than one peer, extend E-switch offloads table peer_flow to hold an array of lists, where each peer have dedicate index via mlx5_get_dev_index(). Thereafter, extend original flow to hold an array of peers as well. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
71c93e37 |
|
25-May-2023 |
Jiri Pirko <jiri@resnulli.us> |
devlink: move port_fn_hw_addr_get/set() to devlink_port_ops Move port_fn_hw_addr_get/set() from devlink_ops into newly introduced devlink_port_ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
8c253dfc |
|
06-Feb-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-switch, Devcom, sync devcom events and devcom comp register devcom events are sent to all registered component. Following the cited patch, it is possible for two components, e.g.: two eswitches, to send devcom events, while both components are registered. This means eswitch layer will do double un/pairing, which is double allocation and free of resources, even though only one un/pairing is needed. flow example: cpu0 cpu1 ---- ---- mlx5_devlink_eswitch_mode_set(dev0) esw_offloads_devcom_init() mlx5_devcom_register_component(esw0) mlx5_devlink_eswitch_mode_set(dev1) esw_offloads_devcom_init() mlx5_devcom_register_component(esw1) mlx5_devcom_send_event() mlx5_devcom_send_event() Hence, check whether the eswitches are already un/paired before free/allocation of resources. Fixes: 09b278462f16 ("net: devlink: enable parallel ops on netlink interface") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2be5bd42 |
|
20-Mar-2023 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: Handle pairing of E-switch via uplink un/load APIs In case user switch a device from switchdev mode to legacy mode, mlx5 first unpair the E-switch and afterwards unload the uplink vport. From the other hand, in case user remove or reload a device, mlx5 first unload the uplink vport and afterwards unpair the E-switch. The latter is causing a bug[1], hence, handle pairing of E-switch as part of uplink un/load APIs. [1] In case VF_LAG is used, every tc fdb flow is duplicated to the peer esw. However, the original esw keeps a pointer to this duplicated flow, not the peer esw. e.g.: if user create tc fdb flow over esw0, the flow is duplicated over esw1, in FW/HW, but in SW, esw0 keeps a pointer to the duplicated flow. During module unload while a peer tc fdb flow is still offloaded, in case the first device to be removed is the peer device (esw1 in the example above), the peer net-dev is destroyed, and so the mlx5e_priv is memset to 0. Afterwards, the peer device is trying to unpair himself from the original device (esw0 in the example above). Unpair API invoke the original device to clear peer flow from its eswitch (esw0), but the peer flow, which is stored over the original eswitch (esw0), is trying to use the peer mlx5e_priv, which is memset to 0 and result in bellow kernel-oops. [ 157.964081 ] BUG: unable to handle page fault for address: 000000000002ce60 [ 157.964662 ] #PF: supervisor read access in kernel mode [ 157.965123 ] #PF: error_code(0x0000) - not-present page [ 157.965582 ] PGD 0 P4D 0 [ 157.965866 ] Oops: 0000 [#1] SMP [ 157.967670 ] RIP: 0010:mlx5e_tc_del_fdb_flow+0x48/0x460 [mlx5_core] [ 157.976164 ] Call Trace: [ 157.976437 ] <TASK> [ 157.976690 ] __mlx5e_tc_del_fdb_peer_flow+0xe6/0x100 [mlx5_core] [ 157.977230 ] mlx5e_tc_clean_fdb_peer_flows+0x67/0x90 [mlx5_core] [ 157.977767 ] mlx5_esw_offloads_unpair+0x2d/0x1e0 [mlx5_core] [ 157.984653 ] mlx5_esw_offloads_devcom_event+0xbf/0x130 [mlx5_core] [ 157.985212 ] mlx5_devcom_send_event+0xa3/0xb0 [mlx5_core] [ 157.985714 ] esw_offloads_disable+0x5a/0x110 [mlx5_core] [ 157.986209 ] mlx5_eswitch_disable_locked+0x152/0x170 [mlx5_core] [ 157.986757 ] mlx5_eswitch_disable+0x51/0x80 [mlx5_core] [ 157.987248 ] mlx5_unload+0x2a/0xb0 [mlx5_core] [ 157.987678 ] mlx5_uninit_one+0x5f/0xd0 [mlx5_core] [ 157.988127 ] remove_one+0x64/0xe0 [mlx5_core] [ 157.988549 ] pci_device_remove+0x31/0xa0 [ 157.988933 ] device_release_driver_internal+0x18f/0x1f0 [ 157.989402 ] driver_detach+0x3f/0x80 [ 157.989754 ] bus_remove_driver+0x70/0xf0 [ 157.990129 ] pci_unregister_driver+0x34/0x90 [ 157.990537 ] mlx5_cleanup+0xc/0x1c [mlx5_core] [ 157.990972 ] __x64_sys_delete_module+0x15a/0x250 [ 157.991398 ] ? exit_to_user_mode_prepare+0xea/0x110 [ 157.991840 ] do_syscall_64+0x3d/0x90 [ 157.992198 ] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Fixes: 1418ddd96afd ("net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7eb197fd |
|
23-Apr-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Use metadata matching for RoCE loopback rule Use metadata matching for RoCE loopback rule if device is configured to use metadata for source port matching. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fd745f4c |
|
10-Mar-2023 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Create per vport table based on devlink encap mode Currently when creating per vport table, create flags are hardcoded. Devlink encap mode is set based on user input and HW capability. Create per vport table based on devlink encap mode. Fixes: c796bb7cd230 ("net/mlx5: E-switch, Generalize per vport table API") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
38d9a740 |
|
21-Mar-2023 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set() Remove unused function which also seems a duplicate of esw_port_metadata_set(). Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0a431418 |
|
20-Mar-2023 |
Maher Sanalla <msanalla@nvidia.com> |
Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports" This reverts commit 606e6a72e29dff9e3341c4cc9b554420e4793f401 which exposes the vnic diagnostic counters via debugfs. Instead, The upcoming series will expose the same counters through devlink health reporter. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
acc10929 |
|
13-Apr-2023 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Allow blocking encap changes in eswitch Existing eswitch encap option enables header encapsulation. Unfortunately currently available hardware isn't able to perform double encapsulation, which can happen once IPsec packet offload tunnel mode is used together with encap mode set to BASIC. So as a solution for misconfiguration, provide an option to block encap changes, which will be used for IPsec packet offload. Reviewed-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
67027828 |
|
17-Feb-2023 |
Paul Blakey <paulb@nvidia.com> |
net/mlx5e: TC, Set CT miss to the specific ct action instance Currently, CT misses restore the missed chain on the tc skb extension so tc will continue from the relevant chain. Instead, restore the CT action's miss cookie on the extension, which will instruct tc to continue from the this specific CT action instance on the relevant filter's action list. Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use this miss mapping instead of the current chain miss object (CHAIN_MISS) for CT action misses. To restore this new miss mapping value, add a RX restore rule for each such mapping value. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Sholmo <ozsh@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
633ad4b2 |
|
21-Sep-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Remove redundant code for handling vlan actions Remove unused code which was used only with deprecated HW which didn't support vlan actions. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d2a651ef |
|
26-Jan-2023 |
Jiri Pirko <jiri@nvidia.com> |
net/mlx5: Move eswitch port metadata devlink param to flow eswitch code Move the param registration and handling code into the eswitch offloads code as they are related to each other. No point in having the devlink param registration done in separate file. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1f0ae22a |
|
12-Dec-2022 |
Moshe Shemesh <moshe@nvidia.com> |
net/mlx5: E-Switch, properly handle ingress tagged packets on VST Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already present in the frame. Previous VST mode behavior was to drop packets or override existing tag, depending on the device version. In this patch we fix this behavior by correctly building the HW steering rule with a push vlan action, or for older devices we ask the FW to stack the vlan when a vlan is already present. Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e5b9642a |
|
06-Dec-2022 |
Shay Drory <shayd@nvidia.com> |
net/mlx5: E-Switch, Implement devlink port function cmds to control migratable Implement devlink port function commands to enable / disable migratable. This is used to control the migratable capability of the device. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
7db98396 |
|
06-Dec-2022 |
Yishai Hadas <yishaih@nvidia.com> |
net/mlx5: E-Switch, Implement devlink port function cmds to control RoCE Implement devlink port function commands to enable / disable RoCE. This is used to control the RoCE device capabilities. This patch implement infrastructure which will be used by downstream patches that will add additional capabilities. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
dcf19b9c |
|
24-Nov-2022 |
Maor Dickman <maord@nvidia.com> |
net/mlx5e: TC, Add offload support for trap with additional actions TC trap action offload is currently supported only when trap is the sole action in the flow. This patch remove this limitation by changing trap action offload to not use MLX5_ATTR_FLAG_SLOW_PATH flag and instead set the flow destination table explicitly to be the slow table. This will allow offload of the additional actions. TC flow example: tc filter add dev $REP2 protocol ip prio 2 root \ flower skip_sw dst_mac $mac0 \ action mirred egress redirect dev $REP3 \ action pedit ex munge eth dst set $mac2 pipe \ action trap Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e87c6a83 |
|
03-Aug-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Fix duplicate lag creation If creating bond first and then enabling sriov in switchdev mode, will hit the following syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:778:(pid 25543): CREATE_LAG(0x840) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x7d49cb), err(-22) The reason is because the offending patch removes eswitch mode none. In vf lag, the checking of eswitch mode none is replaced by checking if sriov is enabled. But when driver enables sriov, it triggers the bond workqueue task first and then setting sriov number in pci_enable_sriov(). So the check fails. Fix it by checking if sriov is enabled using eswitch internal counter that is set before triggering the bond workqueue task. Fixes: f019679ea5f2 ("net/mlx5: E-switch, Remove dependency between sriov and eswitch mode") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
430e2d5e |
|
18-Jul-2022 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Move send to vport meta rule creation Move the creation of the rules from offloads fdb table init to per rep vport init. This way the driver will creating the send to vport meta rule on any representor, e.g. SF representors. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8ea7bcf6 |
|
05-Apr-2022 |
Jianbo Liu <jianbol@nvidia.com> |
net/mlx5: E-Switch, Add default drop rule for unmatched packets The ft_offloads table serves to steer packets, which are from the eswitch, to the representor associated with the packets' source vport. Previously, if a packet's source vport or metadata was not associated with any representor, it was forwarded to the uplink representor. The representor got packets it shouldn't have as they weren't coming from the uplink vport. One such effect of this breakage can be observed if the uplink representor is attached to a bridge, where such illegal packets will be broadcast to the remaining ports, flooding the switch with illegal packets. In the case where IB loopback (e.g, SNAP) is enabled, all transmitted packets would be looped back, and received by the uplink representor, and result in an infinite feedback loop. Therefore, block this hole by adding a default drop rule to the ft_offloads table, so that all unmatched packets with no associated representor are dropped. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
606e6a72 |
|
18-May-2022 |
Michael Guralnik <michaelgur@nvidia.com> |
net/mlx5: Expose vnic diagnostic counters for eswitch managed vports Expose on vport group managers debug counters for their managed vports. Counters are exposed through debugfs, the directory will be present only for functions that are eswitch managers and only counters that are supported on their specific HW/FW will be exposed. Example: $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/ pf sf_8 vf_0 vf_1 $ ls -l /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/ cq_overrun quota_exceeded_command total_q_under_processor_handle invalid_command send_queue_priority_update_flow List of all counter added: total_q_under_processor_handle - number of queues in error state due to an async error or errored command. send_queue_priority_update_flow - number of QP/SQ priority/SL update events. cq_overrun - number of times CQ entered an error state due to an overflow. async_eq_overrun -number of time an EQ mapped to async events was overrun. comp_eq_overrun - number of time an EQ mapped to completion events was overrun. quota_exceeded_command - number of commands issued and failed due to quota exceeded. invalid_command - number of commands issued and failed dues to any reason other than quota exceeded. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b6f2846a |
|
29-May-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch: Change eswitch mode only via devlink command Enable or disable switchdev according to the eswitch mode set by devlink command. So it is not changed by other functions anymore. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f019679e |
|
29-May-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Remove dependency between sriov and eswitch mode Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
fbd43b72 |
|
05-May-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Introduce flag to indicate if fdb table is created Introduce flag to indicate if fdb table is created as a pre-step to prepare for removing dependency between sriov and eswitch mode in the downstream patches. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ea5872dd |
|
10-Feb-2022 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Introduce flag to indicate if vport acl namespace is created Eswitch vport acl namespace is needed when loading vfs. There is no need to free and reallocate it when switching eswitch mode. Introduce flag to indicate if it is created or not. When needed, create it. Only free it when the driver is unloaded or in bare metal mode. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ec2fa47d |
|
14-Dec-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Lag, use lag lock Use a lag specific lock instead of depending on external locks to synchronise the lag creation/destruction. With this, taking E-Switch mode lock is no longer needed for syncing lag logic. Cleanup any dead code that is left over and don't export functions that aren't used outside the E-Switch core code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4202ea95 |
|
01-Mar-2022 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Lag, move E-Switch prerequisite check into lag code There is no need to expose E-Switch function for something that can be checked with already present API inside lag code. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
697319b2 |
|
15-Mar-2022 |
Maor Dickman <maord@nvidia.com> |
net/mlx5e: MPLSoUDP decap, use vlan push_eth instead of pedit Currently action pedit of source and destination MACs is used to fill the MACs in L2 push step in MPLSoUDP decap offload, this isn't aligned to tc SW which use vlan eth_push action to do this. To fix that, offload support for vlan veth_push action is added together with mpls pop action, and deprecate the use of pedit of MACs. Flow example: filter protocol mpls_uc pref 1 flower chain 0 filter protocol mpls_uc pref 1 flower chain 0 handle 0x1 eth_type 8847 mpls_label 555 enc_dst_port 6635 in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 used_hw_stats delayed action order 2: mpls pop protocol ip pipe index 2 ref 1 bind 1 used_hw_stats delayed action order 3: vlan push_eth dst_mac de:a2:ec:d6:69:c8 src_mac de:a2:ec:d6:69:c8 pipe index 2 ref 1 bind 1 used_hw_stats delayed action order 4: mirred (Egress Redirect to device enp8s0f0_0) stolen index 2 ref 1 bind 1 used_hw_stats delayed Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
1749c4c5 |
|
29-Nov-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-switch, add drop rule support to ingress ACL Support inserting an ingress ACL drop rule on the uplink in switchdev mode. This will be used by downstream patches to offload active-backup lag mode. The drop rule (if created) is the first rule in the ACL. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e5d4e1da |
|
19-Dec-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Refactor eswitch attr flags to just attr flags The flags are flow attrs and not esw specific attr flags. Refactor to remove the esw prefix and move from eswitch.h to en_tc.h where struct mlx5_flow_attr exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
08ab0ff4 |
|
07-Dec-2021 |
Shaokun Zhang <zhangshaokun@hisilicon.com> |
net/mlx5: Remove the repeated declaration Function 'mlx5_esw_vport_match_metadata_supported' and 'mlx5_esw_offloads_vport_metadata_set' are declared twice, so remove the repeated declaration and blank line. Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
85c5f7c9 |
|
21-Sep-2021 |
Dmytro Linkin <dlinkin@nvidia.com> |
net/mlx5: E-switch, Create QoS on demand Don't create eswitch QoS (root TSAR) on switch mode change. Create it on first child TSAR object creation - vport or rate group. Keep track root TSAR references and release root TSAR with last object deletion. No need to check for QoS is enabled when installing tc matchall filter. Remove related helper function due to no users of it. Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
166f431e |
|
29-Apr-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5e: Add indirect tc offload of ovs internal port Register callbacks for tc blocks of ovs internal port devices. This allows an indirect offloading rules that apply on such devices as the filter device. In case a rule is added to a tc block of an internal port, the mlx5 driver will implicitly add a matching on the internal port's unique vport metadata value to the rule's matching list. Therefore, only packets that previously hit a rule that redirects to an internal port and got the vport metadata overwritten to the internal port's unique metadata, can match on such indirect rule. Offloading of both ingress and egress tc blocks of internal ports is supported as opposed to other devices where only ingress block offloading is supported. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
27484f71 |
|
08-Jan-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5e: Offload tc rules that redirect to ovs internal port Allow offloading rules that redirect to ovs internal port ingress and egress. To support redirect to ingress device, offloading of REDIRECT_INGRESS action is added. When a tc rule redirects to ovs internal port, the hw rule will overwrite the input vport value in reg_c0 with a new vport metadata value that is mapped for this internal port using the internal port mapping api that is introduce in previous patches. After that the hw rule will redirect the packet to the root table to continue processing with the new vport metadata value. The new vport metadata value indicates that this packet is now arriving through an internal port and therefore should be processed using rules that apply on the same internal port as the filter device. Therefore, following rules that apply on this internal port will have to match on the same vport metadata value as part of their matching keys to make sure the packet belongs to the internal port. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4f4edcc2 |
|
29-Apr-2021 |
Ariel Levkovich <lariel@nvidia.com> |
net/mlx5: E-Switch, Add ovs internal port mapping to metadata support Adding infrastructure to map ovs internal port device to vport match metadata to support offload of rules with internal port as the filter device or as the destination device. The infrastructure allows adding and removing internal port device to an eswitch database and getting a unique vport metadata value to be placed and match on in reg_c0 when offloading rules that are coming from or going to an internal port. The new int port metadata can be written to the source port register in HW to indicate that current source port of the packet is the internal port and not one of the actual HW vports (uplink or VF). Using this method, it is possible to offload TC rules with an OVS internal port as their destination port (overwriting the src vport register) or as the filter port (matching on the value of the src vport register and making sure it matches to the internal port's value). There is also a need to handle a miss case where the packet's src port value was changed in HW to an internal port but a following rule which matches on this new src port value wasn't found in HW. In such case, the packet will be forwarded to the driver with metadata which allows driver to restore the info of the internal port's netdevice. Once this info is restored, the uplink driver can forward the packet to the relevant netdevice in SW. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d40bfedd |
|
05-Sep-2021 |
Maor Dickman <maord@nvidia.com> |
net/mlx5: E-Switch, Increase supported number of forward destinations to 32 Increase supported number of forward destinations in the same rule, local and remote, from 2 to 32. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6ba2e2b3 |
|
07-Sep-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Support accept action Support TC generic 'accept' action in mlx5 by introducing MLX5_ESW_ATTR_FLAG_ACCEPT attribute flag. Flag has similar semantics to existing MLX5_ESW_ATTR_FLAG_SLOW_PATH flag, however, dedicated flag is required because existing 'slow path' flag can be flipped by tunneling subsystem when neighbor changes state. Introduce new helper function mlx5_esw_attr_flags_skip() to check whether attribute flags for 'slow path' or 'accept' action are set and use it in eswitch code instead of direct bit manipulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0fe132ea |
|
31-May-2021 |
Dmytro Linkin <dlinkin@nvidia.com> |
net/mlx5: E-switch, Allow to add vports to rate groups Implement eswitch API that allows updating rate groups. If group pointer is NULL, then move the vport to internal unlimited group zero. Implement devlink_ops->rate_parent_node_set() callback in the terms of the new eswitch group update API. Enable QoS for all group's elements if a group has allocated BW share. Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f47e04eb |
|
31-May-2021 |
Dmytro Linkin <dlinkin@nvidia.com> |
net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups Provide eswitch API to allow controlling group rate limits. Use it to implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set(). The share rate will create relative bandwidth share on the groups level while within the group the user can set shared rate on the member vports of that group and this rate will be relative to the group's share rate. The group with the highest shared rate will get a BW share of 100 and the rest of the groups will get a value that reflects the ratio between their share rate and the maximum share rate. Example: Created four rate groups with tx_share limits: $ devlink port function rate add \ pci/0000:06:00.0/group_1 tx_share 30gbit $ devlink port function rate add \ pci/0000:06:00.0/group_2 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_3 tx_share 20gbit $ devlink port function rate add \ pci/0000:06:00.0/group_4 tx_share 10gbit Assuming link speed is 50 Gbit/sec ratio divider will be 50 / (30+20+20+10) = 0.625. Normalized rate values for the groups: <group_1> 30 * 0.625 = 18.75 Gbit/sec <group_2> 20 * 0.625 = 12.5 Gbit/sec <group_3> 20 * 0.625 = 12.5 Gbit/sec <group_4> 10 * 0.625 = 6.25 Gbit/sec Rate group with unlimited tx_share rate will receive minimum BW value (1Mbit/sec) if presented any group with tx_share rate limit. This allow to not drop all packets in case of heavy traffic. Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
1ae258f8 |
|
31-May-2021 |
Dmytro Linkin <dlinkin@nvidia.com> |
net/mlx5: E-switch, Introduce rate limiting groups API Extend eswitch API with rate limiting groups: - Define new struct mlx5_esw_rate_group that is used to hold all internal group data. - Implement functions that allow creation, destruction and cleanup of groups. - Assign all vports to internal unlimited zero group by default. This commit lays the groundwork for group rate limiting by implementing devlink_ops->rate_node_{new|del}() callbacks to support creating and deleting groups through devlink rate node objects. APIs that allows setting rates and adding/removing members are implemented in following patches. Co-developed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2d116e3e |
|
28-May-2021 |
Dmytro Linkin <dlinkin@nvidia.com> |
net/mlx5: E-switch, Move QoS related code to dedicated file Move eswitch QoS related code into dedicated file. Provide eswitch API to access this code meaning it is isolated and restricted to be used only by eswitch.c. Exception is legacy NDO vf set rate, which moved to esw/legacy.c. Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ee950e5d |
|
30-Apr-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: TC, Restore tunnel info for sample offload Currently the sample offload actions send the encapsulated packet to software. sFlow expects tunneled packets to be decapsulated while having the tunnel properties on the skb metadata fields. Reuse the functions used by connection tracking to map the outer header properties to a unique id. The next patch will use that id to restore the tunnel information of decapsulated packets onto the skb. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
bcd6740c |
|
18-Aug-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: Move sample attribute to flow attribute Currently it is in eswitch attribute. Move it to flow attribute to reflect the change in previous patch. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0027d70c |
|
18-Aug-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: Move esw/sample to en/tc/sample Module sample belongs to en/tc instead of esw. Move it and rename accordingly. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
3ee6233e |
|
17-Jun-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pair Following patches in series allow traffic between vports of different eswitch instances, which requires addressing bridge port by vport_num+esw_owner_vhca_id pair since vport_num is only unique per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with 'esw_owner_vhca_id' field and use it as part of key for mlx5_esw_bridge->vports xarray. With this change we can't rely on switchdev_handle_port_obj_add() helper to get mlx5 representor from stacked device because we need specifically representor from parent eswitch that registered the callback to obtain correct esw_owner_vhca_id. The helper doesn't allow passing additional parameters to predicate function and doesn't provide access to the notifier block to obtain eswitch through br_offloads. Implement custom helpers to obtain mlx5 representor and use them in mlx5_esw_bridge_port_obj_{add|del|attr_set}() implementations. Remove direct pointer to parent bridge from struct mlx5_vport as it is no longer needed. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
82564f6c |
|
08-Aug-2021 |
Leon Romanovsky <leon@kernel.org> |
devlink: Simplify devlink port API calls Devlink port already has pointer to the devlink instance and all API calls that forward these devlink ports to the drivers perform same "devlink_port->devlink" assignment before actual call. This patch removes useless parameter and allows us in the future to create specific devlink_port_ops to manage user space access with reliable ops assignment. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
db202995 |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: E-Switch, add logic to enable shared FDB Shared FDB allows to direct traffic from all the vports in the HCA to a single eswitch. In order to do that three things are needed. 1) Point the ingress ACL of the slave uplink to that of the master. With this, wire traffic from both uplinks will reach the same eswitch with the same metadata where a single steering rule can catch traffic from both ports. 2) Set the FDB root flow table of the slave's eswitch to that of the master. As this flow table can change dynamically make sure to sync it on any set root flow table FDB command. This will make sure traffic from SFs, VFs, ECPFs and PFs reach the master eswitch. 3) Split wire traffic at the eswitch manager egress ACL so that it's directed to the native eswitch manager. We only treat wire traffic from both ports the same at the eswitch level. If such traffic wasn't handled in the eswitch it needs to reach the right representor to be processed by software. For example LACP packets should *always* reach the right uplink representor for correct operation. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cac1eb2c |
|
03-Aug-2021 |
Mark Bloch <mbloch@nvidia.com> |
net/mlx5: Lag, properly lock eswitch if needed Currently when doing hardware lag we check the eswitch mode but as this isn't done under a lock the check isn't valid. As the code needs to sync between two different devices an extra care is needed. - When going to change eswitch mode, if hardware lag is active destroy it. - While changing eswitch modes block any hardware bond creation. - Delay handling bonding events until there are no mode changes in progress. - When attaching a new mdev to lag, block until there is no mode change in progress. In order for the mode change to finish the interface lock will have to be taken. Release the lock and sleep for 100ms to allow forward progress. As this is a very rare condition (can happen if the user unbinds and binds a PCI function while also changing eswitch mode of the other PCI function) it has no real world impact. As taking multiple eswitch mode locks is now required lockdep will complain about a possible deadlock. Register a key per eswitch to make lockdep happy. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
2198b932 |
|
03-Aug-2021 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Use shared mappings for restoring from metadata FTEs are added with mapped metadata which is saved per eswitch. When uplink reps are bonded and we are in a single FDB mode, we could fail to find metadata which was stored on one eswitch mapping but not the other or with a different id. To resolve this issue use shared mapping between eswitch ports. We do not have any conflict using a single mapping, for a type, between the ports. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
740452e0 |
|
25-Apr-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32 The offending refactor commit uses u16 chain wrongly. Actually, it should be u32. Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure") CC: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
19e9bfa0 |
|
02-Apr-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Bridge, add offload infrastructure Create new files bridge.{c|h} in en/rep directory that implement bridge interaction with representor netdevices and handle required events/notifications, bridge.{c|h} in esw directory that implement all necessary eswitch offloading infrastructure and works on vport/eswitch level. Provide new kconfig MLX5_BRIDGE which is automatically selected when both kernel bridge and mlx5 eswitch configs are enabled. Provide basic infrastructure for bridge offloads: - struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure that encapsulates generic bridge-offloads data (notifier blocks, ingress flow table/group, etc.) that is created/deleted on enable/disable eswitch offloads. - struct mlx5_esw_bridge - per-bridge structure that encapsulates per-bridge data (reference counter, FDB, egress flow table/group, etc.) that is created when first eswitch represetor is attached to new bridge and deleted when last representor is removed from the bridge as a result of NETDEV_CHANGEUPPER event. The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB namespace. The new priority is between tc-miss and slow path priorities. Priority consist of two levels: the ingress table that is global per eswitch and matches incoming packets by src_mac/vid and redirects them to next level (egress table) that is chosen according to ingress port bridge membership and matches on dst_mac/vid in order to redirect packet to vport according to the following diagram: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | +--------------------------------------+ | | | | +------+ | | | | | +------v--------+ FDB_BR_OFFLOAD | | | INGRESS_TABLE | | | +------+---+----+ | | | | match | | | +---------+ | | | | | +-------+ | | +-------v-------+ match | | | | | | EGRESS_TABLE +------------> vport | | | +-------+-------+ | | | | | | | +-------+ | | miss | | | +------+------+ | | | | +--------------------------------------+ | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ec3be887 |
|
04-Mar-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: Create TC-miss priority and table In order to adhere to kernel software datapath model bridge offloads must come after TC and NF FDBs. Following patches in this series add new FDB priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload is implemented with unmanaged tables, its miss path is not automatically connected to next priority and requires the code to manually connect with slow table. To keep bridge offloads encapsulated and not mix it with eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD and FDB_SLOW_PATH: + | +---------v----------+ | | | FDB_TC_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_FT_OFFLOAD | | | +---------+----------+ | | | +---------v----------+ | | | FDB_TC_MISS | | | +---------+----------+ | | | +---------v----------+ | | | FDB_SLOW_PATH | | | +---------+----------+ | v Initialize the new priority with single default empty managed table and use the table as TC/NF miss patch instead of slow table. This approach allows bridge offloads to be created as new FDB namespace priority between FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any other modules since miss path of managed TC-miss table is automatically wired to next priority. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
f1b9acd3 |
|
08-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Extend SF table for additional SF id range Extended the SF table to cover additioanl SF id range of external controller. A user optionallly provides the external controller number when user wants to create SF on the external controller. An example on eswitch system: $ devlink dev eswitch set pci/0033:01:00.0 mode switchdev $ devlink port show pci/0033:01:00.0/196607: type eth netdev enP51p1s0f0np0 flavour physical port 0 splittable false pci/0033:01:00.0/131072: type eth netdev eth0 flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr 00:00:00:00:00:00 $ devlink port add pci/0033:01:00.0 flavour pcisf pfnum 0 sfnum 77 controller 1 pci/0033:01:00.0/163840: type eth netdev eth1 flavour pcisf controller 1 pfnum 0 sfnum 77 external true splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
87bd418e |
|
02-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Consider SF ports of host PF Query SF vports count and base id of host PF from the firmware. Account these ports in the total port calculation whenever it is non zero. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
47dd7e60 |
|
18-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Use xarray for vport number to vport and rep mapping Currently vport number to vport and its representor are mapped using an array and an index. Vport numbers of different types of functions are not contiguous. Adding new such discontiguous range using index and number mapping is increasingly complex and hard to maintain. Hence, maintain an xarray of vport and rep whose lookup is done based on the vport number. Each VF and SF entry is marked with a xarray mark to identify the function type. Additionally PF and VF needs special handling for legacy inline mode. They are additionally marked as host function using additional HOST_FN mark. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
9f8c7100 |
|
02-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Prepare to return total vports from eswitch struct Total vports are already stored during eswitch initialization. Instead of calculating everytime, read directly from eswitch. Additionally, host PF's SF vport information is available using QUERY_HCA_CAP command. It is not available through HCA_CAP of the eswitch manager PF. Hence, this patch prepares the return total eswitch vport count from the existing eswitch struct. This further helps to keep eswitch port counting macros and logic within eswitch. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b55b3538 |
|
26-Feb-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Move legacy code to a individual file Currently eswitch offers two modes. Legacy and offloads. Offloads code is already in its own file eswitch_offloads.c However eswitch.c contains the eswitch legacy code and common infrastructure code. To enable future extensions and to better manage generic common eswitch infrastructure code, move the legacy code to its own legacy.c file. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b16f2bb6 |
|
26-Feb-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Convert a macro to a helper routine Convert ESW_ALLOWED macro to a helper routine so that it can be used in other eswitch files. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
6308a5f0 |
|
02-Mar-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, Make vport number u16 Vport number is 16-bit field in hardware. Make it u16. Move location of vport in the structure so that it reduces a hole in the structure. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7bf481d7 |
|
30-Oct-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, let user to enable disable metadata Currently each packet inserted in eswitch is tagged with a internal metadata to indicate source vport. Metadata tagging is not always needed. Metadata insertion is needed for multi-port RoCE, failover between representors and stacked devices. In many other cases, metadata enablement is not needed. Metadata insertion slows down the packet processing rate of the E-switch when it is in switchdev mode. Below table show performance gain with metadata disabled for VXLAN offload rules in both SMFS and DMFS steering mode on ConnectX-5 device. ---------------------------------------------- | steering | metadata | pkt size | rx pps | | mode | | | (million) | ---------------------------------------------- | smfs | disabled | 128Bytes | 42 | ---------------------------------------------- | smfs | enabled | 128Bytes | 36 | ---------------------------------------------- | dmfs | disabled | 128Bytes | 42 | ---------------------------------------------- | dmfs | enabled | 128Bytes | 36 | ---------------------------------------------- Hence, allow user to disable metadata using driver specific devlink parameter. Metadata setting of the eswitch is applicable only for the switchdev mode. Example to show and disable metadata before changing eswitch mode: $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata pci/0000:06:00.0: name esw_port_metadata type driver-specific values: cmode runtime value true $ devlink dev param set pci/0000:06:00.0 \ name esw_port_metadata value false cmode runtime $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> --- changelog: v1->v2: - added performance numbers in commit log - updated commit log and documentation for switchdev mode - added explicit note on when user can disable metadata in documentation
|
#
f94d6389 |
|
21-Sep-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: TC, Add support to offload sample action The following diagram illustrates the hardware model for tc sample action: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +----------------------------------------+ + sample table + + default table per <vport, chain, prio> + +-----------------------------+ +----------------------------------------+ + forward to management vport + + original match + +-----------------------------+ +----------------------------------------+ + other actions + +----------------------------------------+ The sample action is translated to a goto flow table object destination which samples packets according to the provided sample ratio. Sampled packets are duplicated. One copy is processed by a termination table, named the sample table, which sends the packet to the eswitch manager port (that will be processed by software). The second copy is processed by the default table which executes the subsequent actions. The default table is created per <vport, chain, prio> tuple as rules with different prios and chains may overlap. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
be9dc004 |
|
25-Jan-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: TC, Handle sampled packets Mark the sampled packets with a sample restore object. Send sampled packets using the psample api. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
41c2fd94 |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5e: TC, Parse sample action Parse TC sample action and save sample parameters in flow attribute data structure. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c9355682 |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: Instantiate separate mapping objects for FDB and NIC tables Currently, the u32 chain id is mapped to u16 value which is stored on the lower 16 bits of reg_c0 for FDB and reg_b for NIC tables. The mapping is internally maintained by the chains object. However, with the introduction of reg_c0 objects the fdb may store more than just the chain id on reg_c0. This is not relevant for NIC tables. Separate the chains mapping instantiation for FDB and NIC tables. Remove the mapping from the chains object. For FDB tables, create the mapping per eswitch. For NIC tables, create the mapping per tc table. Pass the corresponding mapping pointer when creating the chains object. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
a91d98a0 |
|
10-Sep-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: Map register values to restore objects Currently reg_c0 lower 16 bits and reg_b are used to store the chain id that missed in FDB and NIC tables accordingly. However, the registers' values may index a restore object, rather than a single u32 value. Different object types can be used to restore mutually exclusive contexts such as chain id and sample group id. Use the mapping object to associate an index with a restore object as a prestep for supporting additional restore types. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c796bb7c |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Generalize per vport table API Currently, per vport table was used only for port mirroring actions. However, sample action will also require a per vport table instance. Generalize the vport table API to work with multiple namespaces where each namespace manages its own vport table instance. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
0a9e2307 |
|
14-Jan-2021 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Rename functions to follow naming convention. Public api starts with mlx5 and remove mlx5 for non-public api. Signed-off-by: Chris Mi <cmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
4c7f4028 |
|
30-Aug-2020 |
Chris Mi <cmi@nvidia.com> |
net/mlx5: E-switch, Move vport table functions to a new file Currently, the vport table functions are in common eswitch offload file. This file is too big. Move the vport table create, delete and lookup functions to a separate file. Put the file in esw directory. Pre-step for generalizing its functionality for serving both the mirroring and the sample features. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
233dd7d6 |
|
03-Feb-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, move QoS specific fields to existing qos struct Function QoS related fields are already defined in qos related struct. min and max rate are left out to mlx5_vport_info struct. Move them to existing qos struct. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
b47e1056 |
|
02-Feb-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, cut down mlx5_vport_info structure size by 8 bytes Structure mlx5_vport_info consumes 40 bytes of space due to a hole in it. After packing it reduces to 32 bytes. Currently: pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u8 qos; /* 8 1 */ /* XXX 7 bytes hole, try to pack */ u64 node_guid; /* 16 8 */ int link_state; /* 24 4 */ u32 min_rate; /* 28 4 */ u32 max_rate; /* 32 4 */ bool spoofchk; /* 36 1 */ bool trusted; /* 37 1 */ /* size: 40, cachelines: 1, members: 9 */ /* sum members: 31, holes: 1, sum holes: 7 */ /* padding: 2 */ /* last cacheline: 40 bytes */ }; After packing: $ pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u64 node_guid; /* 8 8 */ int link_state; /* 16 4 */ u32 min_rate; /* 20 4 */ u32 max_rate; /* 24 4 */ u8 qos; /* 28 1 */ u8 spoofchk:1; /* 29: 0 1 */ u8 trusted:1; /* 29: 1 1 */ /* size: 32, cachelines: 1, members: 9 */ /* padding: 2 */ /* bit_padding: 6 bits */ /* last cacheline: 32 bytes */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
e591605f |
|
03-Feb-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, move QoS specific fields to existing qos struct Function QoS related fields are already defined in qos related struct. min and max rate are left out to mlx5_vport_info struct. Move them to existing qos struct. Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
cadb129f |
|
02-Feb-2021 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-Switch, cut down mlx5_vport_info structure size by 8 bytes Structure mlx5_vport_info consumes 40 bytes of space due to a hole in it. After packing it reduces to 32 bytes. Currently: pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u8 qos; /* 8 1 */ /* XXX 7 bytes hole, try to pack */ u64 node_guid; /* 16 8 */ int link_state; /* 24 4 */ u32 min_rate; /* 28 4 */ u32 max_rate; /* 32 4 */ bool spoofchk; /* 36 1 */ bool trusted; /* 37 1 */ /* size: 40, cachelines: 1, members: 9 */ /* sum members: 31, holes: 1, sum holes: 7 */ /* padding: 2 */ /* last cacheline: 40 bytes */ }; After packing: $ pahole -C mlx5_vport_info drivers/net/ethernet/mellanox/mlx5/core/eswitch.o struct mlx5_vport_info { u8 mac[6]; /* 0 6 */ u16 vlan; /* 6 2 */ u64 node_guid; /* 8 8 */ int link_state; /* 16 4 */ u32 min_rate; /* 20 4 */ u32 max_rate; /* 24 4 */ u8 qos; /* 28 1 */ u8 spoofchk:1; /* 29: 0 1 */ u8 trusted:1; /* 29: 1 1 */ /* size: 32, cachelines: 1, members: 9 */ /* padding: 2 */ /* bit_padding: 6 bits */ /* last cacheline: 32 bytes */ }; Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
7dc84de9 |
|
16-Sep-2020 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Protect changing mode while adding rules We re-use the native NIC port net device instance for the Uplink representor, a driver currently cannot unbind TC setup callback actively, hence protect changing E-Switch mode while adding rules. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c55479d0 |
|
16-Sep-2020 |
Roi Dayan <roid@nvidia.com> |
net/mlx5: E-Switch, Change mode lock from mutex to rw semaphore E-Switch mode change routine will take the write lock to prevent any consumer to access the E-Switch resources while E-Switch is going through a mode change. In the next patch E-Switch consumers (e.g vport representors) will take read_lock prior to accessing E-Switch resources to prevent E-Switch mode changing in the middle of the operation. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
5a65d85d |
|
21-Oct-2020 |
Roi Dayan <roid@nvidia.com> |
net/mlx5e: Register nic devlink port with switch id We will re-use the native NIC port net device instance for the Uplink representor. Since the netdev will be kept registered while we engage switchdev mode also the devlink will be kept registered. Register the nic devlink port with switch id so it will be available when changing profiles. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8914add2 |
|
25-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Handle FIB events to update tunnel endpoint device Process FIB route update events to dynamically update the stack device rules when tunnel routing changes. Use rtnl lock to prevent FIB event handler from running concurrently with neigh update and neigh stats workqueue tasks. Use encap_tbl_lock mutex to synchronize with TC rule update path that doesn't use rtnl lock. FIB event workflow for encap flows: - Unoffload all flows attached to route encaps from slow or fast path depending on encap destination endpoint neigh state. - Update encap IP header according to new route dev. - Update flows mod_hdr action that is responsible for overwriting reg_c0 source port bits to source port of new underlying VF of new route dev. This step requires changing flow create/delete code to save flow parse attribute mod_hdr_acts structure for whole flow lifetime instead of deallocating it after flow creation. Refactor mod_hdr code to allow saving id of individual mod_hdr actions and updating them with dedicated helper. - Offload all flows to either slow or fast path depending on encap destination endpoint neigh state. FIB event workflow for decap flows: - Unoffload all route flows from hardware. When last route flow is deleted all indirect table rules for the route dev will also be deleted. - Update flow attr decap_vport and destination MAC according to underlying VF of new rote dev. - Offload all route flows back to hardware creating new indirect table rules according to updated flow attribute data. Extract some neigh update code to helper functions to be used by both neigh update and route update infrastructure. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
777bb800 |
|
21-Sep-2020 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Create route entry infrastructure Implement dedicated route entry infrastructure to be used in following patch by route update event. Both encap (indirectly through their corresponding encap entries) and decap (directly) flows are attached to routing entry. Since route update also requires updating encap (route device MAC address is a source MAC address of tunnel encapsulation), same encap_tbl_lock mutex is used for synchronization. The new infrastructure looks similar to existing infrastructures for shared encap, mod_hdr and hairpin entries: - Per-eswitch hash table is used for quick entry lookup. - Flows are attached to per-entry linked list and hold reference to entry during their lifetime. - Atomic reference counting and rcu mechanisms are used as synchronization primitives for concurrent access. The infrastructure also enables connection tracking on stacked devices topology by attaching CT chain 0 flow on tunneling dev to decap route entry. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8e404fef |
|
31-Aug-2020 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: Match recirculated packet miss in slow table using reg_c1 Previous patch in series that implements stack devices RX path implements indirect table rules that match on tunnel VNI. After such rule is created all tunnel traffic is recirculated to root table. However, recirculated packet might not match on any rules installed in the table (for example, when IP traffic follows ARP traffic). In that case packets appear on representor of tunnel endpoint VF instead being redirected to the VF itself. Extend slow table with additional flow group that matches on reg_c0 (source port value set by indirect tables implemented by previous patch in series) and reg_c1 (special 0xFFF mark). When creating offloads fdb tables, install one rule per VF vport to match on recirculated miss packets and redirect them to appropriate VF vport. Modify indirect tables code to also rewrite reg_c1 with special 0xFFF mark. Implementation reuses reg_c1 tunnel id bits. This is safe to do because recirculated packets are always matched before decapsulation. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
34ca6535 |
|
24-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5: E-Switch, Indirect table infrastructure Indirect table infrastructure is used to allow fully processing VF tunnel traffic in hardware. Kernel software model uses two TC rules for such traffic: UL rep to tunnel device, then tunnel VF rep to destination VF rep. To implement such pipeline driver needs to program the hardware after matching on UL rule to overwrite source vport from UL to tunnel VF and recirculate the packet to the root table to allow matching on the rule installed on tunnel VF. For this indirect table matches all encapsulated traffic by tunnel parameters and all other IP traffic is sent to tunnel VF by the miss rule. Indirect table API overview: - mlx5_esw_indir_table_{init|destroy}() - init and destroy opaque indirect table object. - mlx5_esw_indir_table_get() - get or create new table according to vport id and IP version. Table has following pre-created groups: recirculation group with match on ethertype and VNI (rules that match encapsulated packets are installed to this group) and forward group with default/miss rule that forwards to vport of tunnel endpoint VF (rule for regular non-encapsulated packets). - mlx5_esw_indir_table_put() - decrease reference to the indirect table and matching rule (for encapsulated traffic). - mlx5_esw_indir_table_needed() - check that in_port is an uplink port and out_port is VF on the same eswitch, verify that the rule is for IP traffic and source port rewrite functionality can be used. - mlx5_esw_indir_table_decap_vport() - function returns decap vport of flow attribute. Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
10742efc |
|
21-Jan-2021 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: VF tunnel TX traffic offloading When tunnel endpoint is on VF, driver still assumes that endpoint is on uplink and incorrectly configures encap rule offload according to that assumption. As a result, traffic is sent directly to the uplink and rules installed on representor of tunnel endpoint VF are ignored. Implement following changes to allow offloading tx traffic with tunnel endpoint on VF: - For tunneling flows perform route lookup on route and out devices pair. If out device is uplink and route device is VF of same physical port, then modify packet reg_c_0 metadata register (source port) with the value of VF vport. Use eswitch vhca_id->vport mapping introduced in one of previous patches in the series to obtain vport from route netdevice. - Recirculate encapsulated packets to VF vport in order to apply any flow rules installed on VF representor that match on encapsulated traffic. Only enable support for this functionality when all following conditions are true: - Hardware advertises capability to preserve reg_c_0 value on packet recirculation. - Vport metadata matching is enabled. - Termination tables are to be used by the flow. Example TC rules for VF tunnel traffic: 1. Rule that redirects packets from UL to VF rep that has the tunnel endpoint IP address: $ tc -s filter show dev enp8s0f0 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac 16:c9:a0:2d:69:2c src_mac 0c:42:a1:58:ab:e4 eth_type ipv4 ip_flags nofrag in_hw in_hw_count 1 action order 1: mirred (Egress Redirect to device enp8s0f0_0) stolen index 3 ref 1 bind 1 installed 377 sec used 0 sec Action statistics: Sent 114096 bytes 952 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 114096 bytes 952 pkt backlog 0b 0p requeues 0 cookie 878fa48d8c423fc08c3b6ca599b50a97 no_percpu used_hw_stats delayed 2. Rule that decapsulates the tunneled flow and redirects to destination VF representor: $ tc -s filter show dev vxlan_sys_4789 ingress filter protocol ip pref 4 flower chain 0 filter protocol ip pref 4 flower chain 0 handle 0x1 dst_mac ca:2e:a7:3f:f5:0f src_mac 0a:40:bd:30:89:99 eth_type ipv4 enc_dst_ip 7.7.7.5 enc_src_ip 7.7.7.1 enc_key_id 98 enc_dst_port 4789 enc_tos 0 ip_flags nofrag in_hw in_hw_count 1 action order 1: tunnel_key unset pipe index 2 ref 1 bind 1 installed 434 sec used 434 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: mirred (Egress Redirect to device enp8s0f0_1) stolen index 4 ref 1 bind 1 installed 434 sec used 0 sec Action statistics: Sent 129936 bytes 1082 pkt (dropped 0, overlimits 0 requeues 0) Sent software 0 bytes 0 pkt Sent hardware 129936 bytes 1082 pkt backlog 0b 0p requeues 0 cookie ac17cf398c4c69e4a5b2f7aabd1b88ff no_percpu used_hw_stats delayed Co-developed-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
84ae9c1f |
|
23-Sep-2020 |
Vlad Buslov <vladbu@nvidia.com> |
net/mlx5e: E-Switch, Maintain vhca_id to vport_num mapping Following patches in the series need to be able to map VF netdev to vport. Since it is trivial to obtain vhca_id from netdev, maintain mapping from vhca_id to vport_num inside eswitch offloads using xarray. Provide function mlx5_eswitch_vhca_id_to_vport() to be used by TC code in following patches to obtain the mapping. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
8f010541 |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: SF, Add port add delete functionality To handle SF port management outside of the eswitch as independent software layer, introduce eswitch notifier APIs so that mlx5 upper layer who wish to support sf port management in switchdev mode can perform its task whenever eswitch mode is set to switchdev or before eswitch is disabled. Initialize sf port table on such eswitch event. Add SF port add and delete functionality in switchdev mode. Destroy all SF ports when eswitch is disabled. Expose SF port add and delete to user via devlink commands. $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88 pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached or by its unique port index: $ devlink port show pci/0000:06:00.0/32768 pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false function: hw_addr 00:00:00:00:00:00 state inactive opstate detached $ devlink port show ens2f0npf0sf88 -jp { "port": { "pci/0000:06:00.0/32768": { "type": "eth", "netdev": "ens2f0npf0sf88", "flavour": "pcisf", "controller": 0, "pfnum": 0, "sfnum": 88, "external": false, "splittable": false, "function": { "hw_addr": "00:00:00:00:00:00", "state": "inactive", "opstate": "detached" } } } } Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d970812b |
|
11-Dec-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Add eswitch helpers for SF vport Add helpers to enable/disable eswitch port, register its devlink port and load its representor. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
d7f33a45 |
|
11-Dec-2020 |
Vu Pham <vuhuong@nvidia.com> |
net/mlx5: E-switch, Prepare eswitch to handle SF vport Prepare eswitch to handle SF vport during (a) querying eswitch functions (b) egress ACL creation (c) account for SF vports in total vports calculation Assign a dedicated placeholder for SFs vports and their representors. They are placed after VFs vports and before ECPF vports as below: [PF,VF0,...,VFn,SF0,...SFm,ECPF,UPLINK]. Change functions to map SF's vport numbers to indices when accessing the vports or representors arrays, and vice versa. Signed-off-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c7eddc60 |
|
31-Aug-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Move devlink eswitch ports closer to eswitch Currently devlink eswitch ports are registered and unregistered by the representor layer. However it is better to register them at eswitch layer so that in future user initiated command port add and delete commands can also register/unregister devlink ports without depending on representor layer. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Vu Pham <vuhuong@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
c620b772 |
|
29-Apr-2020 |
Ariel Levkovich <lariel@mellanox.com> |
net/mlx5: Refactor tc flow attributes structure In order to support chains and connection tracking offload for nic flows, there's a need to introduce a common flow attributes struct so that these features can be agnostic and have access to a single attributes struct, regardless of the flow type. Therefore, a new tc flow attributes format is introduced to allow access to attributes that are common to eswitch and nic flows. The common attributes will always get allocated for the new flows, regardless of their type, while the type specific attributes are separated into different structs and will be allocated based on the flow type to avoid memory waste. When allocating the flow attributes the caller provides the flow steering namespace and according the namespace type the additional space for the extra, type specific, attributes is determined and added to the total attribute allocation size. In addition, the attributes that are going to be common to both flow types are moved to the common attributes struct. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
#
ae430332 |
|
24-Apr-2020 |
Ariel Levkovich <lariel@mellanox.com> |
net/mlx5: Refactor multi chains and prios support Decouple the chains infrastructure from eswitch and make it generic to support other steering namespaces. The change defines an agnostic data structure to keep all the relevant information for maintaining flow table chaining in any steering namespace. Each namespace that requires table chaining will be required to allocate such data structure. The chains creation code will receive the steering namespace and flow table parameters from the caller so it will operate agnosticly when creating the required resources to maintain the table chaining function while Parts of the code that are relevant to eswitch specific functionality are moved to eswitch files. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a53cf949 |
|
08-Sep-2020 |
Parav Pandit <parav@nvidia.com> |
net/mlx5: E-switch, Read controller number from device ECPF supports one external host controller. Read controller number from the device. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
7d0314b1 |
|
05-Apr-2020 |
Ron Diskin <rondi@mellanox.com> |
net/mlx5e: Modify uplink state on interface up/down When setting the PF interface up/down, notify the firmware to update uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled. This behavior will prevent sending traffic out on uplink port when PF is down, such as sending traffic from a VF interface which is still up. Currently when calling mlx5e_open/close(), the driver only sends PAOS command to notify the firmware to set the physical port state to up/down, however, it is not sufficient. When VF is in "auto" state, it follows the uplink state, which was not updated on mlx5e_open/close() before this patch. When switchdev mode is enabled and uplink representor is first enabled, set the uplink port state value back to its FW default "AUTO". Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down") Signed-off-by: Ron Diskin <rondi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0da3c12d |
|
20-Jul-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Reuse total_vports and avoid duplicate nvports Total e-switch vports are already stored in mlx5_eswitch total_vports. Avoid copy of it in nvports and reuse existing total_vports calculation. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
3d5f41ca |
|
27-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Avoid function change handler for non ECPF for non ECPF eswitch manager function, vports are already enabled/disabled when eswitch is enabled/disabled respectively. Simplify function change handler for such eswitch manager function. Therefore, ECPF is the only one which remains PF/VF function change handler. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
188f0f98 |
|
25-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Avoid eswitch header inclusion in fs core layer Flow steering core layer is independent of the eswitch layer. Hence avoid fs_core dependency on eswitch. Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
04dfa705 |
|
28-May-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Avoid eswitch header inclusion in fs core layer Flow steering core layer is independent of the eswitch layer. Hence avoid fs_core dependency on eswitch. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
330077d1 |
|
18-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Supporting setting devlink port function mac address Enable user to set mac address of the PCI PF and VF port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f099fde1 |
|
18-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Support querying port function mac address Support querying mac address of the eswitch devlink port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
443bf36e |
|
18-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Move helper to eswitch layer To use port number to port index conversion at eswitch level, move it to eswitch header. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bd939753 |
|
18-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Introduce and use eswitch support check helper Introduce an helper routine to get esw from a devlink device and use it at eswitch callbacks and in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fa997825 |
|
18-Jun-2020 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Constify mac address pointer Since none of the functions need to modify the input mac address, constify them. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
133dcfc5 |
|
28-Feb-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Alloc and free unique metadata for match Introduce infrastructure to create unique metadata for match for vport without depending on vport_num. Vport uses its default metadata for match in standalone configuration but will share a different unique "bond_metadata" for match with other vports in bond configuration. Using ida to generate unique metadata for match for vports in default and bond configurations. Introduce APIs to generate, free metadata for match. Introduce APIs to set vport's bond_metadata and replace its ingress acl rules with bond_metatada. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
bf773dc0 |
|
16-Mar-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule By default, e-switch vport's egress acl just forward packets to its counterpart NIC vport using existing egress acl table. During port failover in bonding scenario where two VFs representors are bonded, the egress acl forward-to-vport rule will be added to the existing egress acl table of e-switch vport of passive/inactive slave representor to forward packets to other NIC vport ie. the active slave representor's NIC vport to handle egress "failover" traffic. Enable egress acl and have APIs to create and destroy egress acl forward-to-vport rule and group. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
07bab950 |
|
28-Mar-2020 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Refactor eswitch ingress acl codes Restructure the eswitch ingress acl codes into eswitch directory and different files: . Acl ingress helper functions to acl_helper.c/h . Acl ingress functions used in offloads mode to acl_ingress_ofld.c . Acl ingress functions used in legacy mode to acl_ingress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com>
|
#
ea651a86 |
|
06-Nov-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Refactor eswitch egress acl codes Refactor the egress acl codes so that offloads and legacy modes can configure specifically their own needs of egress acl table, groups and rules. While at it, restructure the eswitch egress acl codes into eswitch directory and different files: . Acl egress helper functions to acl_helper.c/h . Acl egress functions used in offloads mode to acl_egress_ofld.c . Acl egress functions used in legacy mode to acl_egress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
14e6b038 |
|
03-Feb-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5e: Add support for hw decapsulation of MPLS over UDP MPLS over UDP is supported in hardware by using a packet reformat object with reformat type equal L3_TUNNEL_TO_L2 which both decapsulates the outer L3, L4 and MPLS headers, and allows for setting the L2 headers of the resulting decapsulated packet. For the hardware to operate correctly, the configuration of the firmware must have FLEX_PARSER_PROFILE_ENABLE = 1. Example tc rule: tc filter add dev bareudp0 protocol all prio 1 root flower enc_dst_port \ 6635 enc_src_ip 8.8.8.23 action mpls pop protocol ip pipe \ action pedit ex munge eth dst set 00:11:22:33:44:21 pipe action \ mirred egress redirect dev enp59s0f0_0 We use pedit to set the correct destination MAC. For MPLS over UDP decapsulation to take place, the driver logic requires the following: 1. flower filter added on bareudp device. 2. action mpls pop 3. zero or more pedit munge actions 4. one redirect action Current implementation supports only IPv4 and no VLAN. tc filter show output looks like this: filter protocol all pref 1 flower chain 0 filter protocol all pref 1 flower chain 0 handle 0x1 enc_src_ip 8.8.8.24 enc_dst_port 6635 in_hw in_hw_count 1 action order 1: mpls pop protocol ip pipe index 2 ref 1 bind 1 action order 2: pedit action pipe keys 2 index 1 ref 1 bind 1 key #0 at eth+0: val 00112233 mask 00000000 key #1 at eth+4: val 44210000 mask 0000ffff action order 3: mirred (Egress Redirect to device enp59s0f0_0) stolen index 2 ref 1 bind 1 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e08a6832 |
|
08-Apr-2020 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Update eswitch to new cmd interface Do mass update of eswitch to reuse newly introduced mlx5_cmd_exec_in*() interfaces. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
#
84be2fda |
|
28-Mar-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Fix condition for termination table cleanup When we destroy rules from slow path we need to avoid destroying termination tables since termination tables are never created in slow path. By doing so we avoid destroying the termination table created for the slow path. Fixes: d8a2034f152a ("net/mlx5: Don't use termination tables in slow path") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8e0aa4bc |
|
18-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Protect eswitch mode changes Currently eswitch mode change is occurring from 2 different execution contexts as below. 1. sriov sysfs enable/disable 2. devlink eswitch set commands Both of them need to access eswitch related data structures in synchronized manner. Without any synchronization below race condition exist. SR-IOV enable/disable with devlink eswitch mode change: cpu-0 cpu-1 ----- ----- mlx5_device_disable_sriov() mlx5_devlink_eswitch_mode_set() mlx5_eswitch_disable() esw_offloads_stop() esw_offloads_disable() mlx5_eswitch_disable() esw_offloads_disable() Hence, they are synchronized using a new mode_lock. eswitch's state_lock is not used as it can lead to a deadlock scenario below and state_lock is only for vport and fdb exclusive access. ip link set vf <param> netlink rcv_msg() - Lock A rtnl_lock vfinfo() esw->state_lock() - Lock B devlink eswitch_set devlink_mutex esw->state_lock() - Lock B attach_netdev() register_netdev() rtnl_lock - Lock A Alternatives considered: 1. Acquiring rtnl lock before taking esw->state_lock to follow similar locking sequence as ip link flow during eswitch mode set. rtnl lock is not good idea for two reasons. (a) Holding rtnl lock for several hundred device commands is not good idea. (b) It leads to below and more similar deadlocks. devlink eswitch_set devlink_mutex rtnl_lock - Lock A esw->state_lock() - Lock B eswitch_disable() reload() ib_register_device() ib_cache_setup_one() rtnl_lock() 2. Exporting devlink lock may lead to undesired use of it in vendor driver(s) in future. 3. Unloading representors outside of the mode_lock requires serialization with other process trying to enable the eswitch. 4. Differing the representors life cycle to a different workqueue requires synchronization with func_change_handler workqueue. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ebf77bb8 |
|
18-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Extend eswitch enable to handle num_vfs change Subsequent patch protects eswitch mode changes across sriov and devlink interfaces. It is desirable for eswitch to provide thread safe eswitch enable and disable APIs. Hence, extend eswitch enable API to optionally update num_vfs when requested. In subsequent patch, eswitch num_vfs are updated after all the eswitch users eswitch drops its reference count. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d8a2034f |
|
26-Feb-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Don't use termination tables in slow path Don't use termination tables for packets that are steered to the slow path, as a pre-step for supporting packet encap (packet reformat) action on termination tables. Packet encap (reformat action) actions steer the packet to the slow path until outer arp entries are resolved. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b5f814cc |
|
01-Mar-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: Avoid configuring eswitch QoS if not supported Check if QoS is enabled for the eswitch before attempting to configure QoS parameters and emit a netlink error if not supported. Introduce an API to check if QoS is supported for the eswitch. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
9d3faa51 |
|
13-Mar-2020 |
Nathan Chancellor <nathan@kernel.org> |
net/mlx5: Add missing inline to stub esw_add_restore_rule When CONFIG_MLX5_ESWITCH is unset, clang warns: In file included from drivers/net/ethernet/mellanox/mlx5/core/main.c:58: drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:670:1: warning: unused function 'esw_add_restore_rule' [-Wunused-function] esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag) ^ 1 warning generated. This stub function is missing inline; add it to suppress the warning. Fixes: 11b717d61526 ("net/mlx5: E-Switch, Get reg_c0 value on CQE") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
23bb50cf |
|
12-Nov-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Update VF vports config when num of VFs changed Currently, ECPF eswitch manager does one-time only configuration for VF vports when device switches to offloads mode. However, when num of VFs changed from host side, driver doesn't update VF vports configurations. Hence, perform VFs vport configuration update whenever num_vfs change event occurs. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c2d7712c |
|
11-Nov-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Introduce per vport configuration for eswitch modes Both legacy and offload modes require vport setup, only offload mode requires rep setup. Before this patch, vport and rep operations are separated applied to all relevant vports in different stages. Change to use per vport configuration, so that vport and rep operations are modularized per vport. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4c3844d9 |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5e: CT: Introduce connection tracking Add support for offloading tc ct action and ct matches. We translate the tc filter with CT action the following HW model: +-------------------+ +--------------------+ +--------------+ + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +-----> + original match + | + tuple + zone match + | + fte_id match + | +-------------------+ | +--------------------+ | +--------------+ | v v v set chain miss mapping set mark original set fte_id set label filter set zone set established actions set tunnel_id do nat (if needed) do decap Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6fb0701a |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Add support for offloading rules with no in_port FTEs in global tables may match on packets from multiple in_ports. Provide the capability to omit the in_port match condition. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d18296ff |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Introduce global tables Currently, flow tables are automatically connected according to their <chain,prio,level> tuple. Introduce global tables which are flow tables that are detached from the eswitch chains processing, and will be connected by explicitly referencing them from multiple chains. Add this new table type, and allow connecting them by refenece. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5b7cb745 |
|
11-Mar-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Enable reg c1 loopback when possible Enable reg c1 loopback if firmware reports it's supported, as this is needed for restoring packet metadata (e.g chain). Also define helper to query if it is enabled. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cc617ced |
|
18-Dec-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, make query inline mode a static function mlx5_eswitch_inline_mode_get() is used only in eswitch_offloads.c. Hence, make it static and adjacent to its caller function. Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
87dac697 |
|
26-Dec-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: Add devlink fdb_large_groups parameter Add a devlink parameter to control the number of large groups in a autogrouped flow table. The default value is 15, and the range is between 1 and 1024. The size of each large group can be calculated according to the following formula: size = 4M / (fdb_large_groups + 1). Examples: - Set the number of large groups to 20. $ devlink dev param set pci/0000:82:00.0 name fdb_large_groups \ cmode driverinit value 20 Then run devlink reload command to apply the new value. $ devlink dev reload pci/0000:82:00.0 - Read the number of large groups in flow table. $ devlink dev param show pci/0000:82:00.0 name fdb_large_groups pci/0000:82:00.0: name fdb_large_groups type driver-specific values: cmode driverinit value 20 Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
96e32687 |
|
14-Jan-2020 |
Eli Cohen <eli@mellanox.com> |
net/mlx5e: Eswitch, Use per vport tables for mirroring When using port mirroring, we forward the traffic to another table and use that table to forward to the mirrored vport. Since the hardware loses the values of reg c, and in particular reg c0, we fail the match on the input vport which previously existed in reg c0. To overcome this situation, we use a set of per vport tables, positioned at the lowest priority, and forward traffic to those tables. Since these tables are per vport, we can avoid matching on reg c0. Fixes: c01cfd0f1115 ("net/mlx5: E-Switch, Add match on vport metadata for rule in fast path") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6724e66b |
|
15-Feb-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Get reg_c1 value on miss The HW model implicitly decapsulates tunnels on chain 0 and sets reg_c1 with the mapped tunnel id. On miss, the packet does not have the outer header and the driver restores the tunnel information from the tunnel id. Getting reg_c1 value in software requires enabling reg_c1 loopback and copying reg_c1 to reg_b. reg_b comes up on CQE as cqe->imm_inval_pkey. Use the reg_c0 restoration rules to also copy reg_c1 to reg_B. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
11b717d6 |
|
15-Feb-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Get reg_c0 value on CQE On RX side create a restore table in OFFLOADS namespace. This table will match on all values for reg_c0 we will use, and set it to the flow_tag. This flow tag can then be read on the CQE. As there is no copy action from reg c0 to flow tag, instead we have to set the flow tag explictily. We add an API so callers can add all the used reg_c0 values (tags) and for each of those we add a restore rule. This will be used in a following patch to save the miss chain mapping tag on reg_c0 and from it restore the tc chain on the skb. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
39ac237c |
|
07-Jan-2020 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Refactor chains and priorities To support the entire chain and prio range (32bit + 16bit), instead of a using a static array of chains/prios of limited size, create them dynamically, and use a rhashtable to search for existing chains/prio combinations. This will be used in next patch to actually increase the number using unamanged tables support and ignore flow level capability. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e66cbc96 |
|
26-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: ft: Use getter function to get ft chain FT chain is defined as the next chain after tc. To prepare for next patches that will increase the number of tc chains available at runtime, use a getter function to get this value. The define is still used in static fs_core allocation, to calculate the number of chains. This static allocation will be used if the relevant capabilities won't be available to support dynamic chains. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b7826076 |
|
12-Nov-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5e: E-switch, Fix Ingress ACL groups in switchdev mode for prio tag In cited commit, when prio tag mode is enabled, FTE creation fails due to missing group with valid match criteria. Hence, (a) create prio tag group metadata_prio_tag_grp when prio tag is enabled with match criteria for vlan push FTE. (b) Rename metadata_grp to metadata_allmatch_grp to reflect its purpose. Also when priority tag is enabled, delete metadata settings after deleting ingress rules, which are using it. Tide up rest of the ingress config code for unnecessary labels. Fixes: 10652f39943e ("net/mlx5: Refactor ingress acl configuration") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
975b992f |
|
11-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Add new chain for netfilter flow table offload Netfilter tables (nftables) implements a software datapath that comes after tc ingress datapath. The datapath supports offloading such rules via the flow table offload API. This API is currently only used by NFT and it doesn't provide the global priority in regards to tc offload, so we assume offloading such rules must come after tc. It does provide a flow table priority parameter, so we need to provide some supported priority range. For that, split fastpath prio to two, flow table offload and tc offload, with one dedicated priority chain for flow table offload. Next patch will re-use the multi chain API to access this chain by allowing access to this chain by the fdb_sub_namespace. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4db7b98e |
|
11-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Define fdb tc levels per prio Define FDB_TC_LEVELS_PER_PRIO instead of magic number 2. This is the number of levels used by each tc prio table in the fdb. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2cf2954b |
|
11-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines Rename it to prepare for next patch that will add a different type of offload to the FDB. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
12063c2e |
|
11-Nov-2019 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Simplify fdb chain and prio eswitch defines FDB_MAX_CHAIN and FDB_MAX_PRIO were defined differently depending on if CONFIG_MLX5_ESWITCH is enabled to save space on allocations. This is a minor space saving, and there is no real need for it. Simplify things instead, and define them the same in both cases. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
556b9d16 |
|
03-Sep-2019 |
Aya Levin <ayal@mellanox.com> |
net/mlx5: Clear VF's configuration on disabling SRIOV When setting number of VFs to 0 (disable SRIOV), clear VF's configuration. Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
238302fa |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Enable metadata on own vport Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe (manager vport). Metadata when supported, must be enabled on own vport which is used to pass metadata to vport of NIC Rx Flow Table. Due to this error, traffic tagged by ingress ACL is not processed correctly at NIC rx flow table level which is supposed to work on metadata tag. Hence, instead of working on eswitch manager vport, always working on eswitch own vport regardless of PF or ECPF. Given that mlx5_eswitch_query/modify_esw_vport_context() is used to access other vport in legacy mode and own vport settings in switchdev mode, extend low level API to explicitly specify other_vport. Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10652f39 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Refactor ingress acl configuration Drop, untagged, spoof check and untagged spoof check flow groups are limited to legacy mode only. Therefore, following refactoring is done to (a) improve code readability (b) have better code split between legacy and offloads mode 1. Move legacy flow groups under legacy structure 2. Add validity check for group deletion 3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode 4. Rename esw_vport_enable_ingress_acl() to esw_vport_create_ingress_acl_table() and limit its scope to table creation 5. Introduce legacy flow groups creation helper esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode 6. Reduce offloads ingress groups from 4 to just 1 metadata group per vport 7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free. 8. Shortern error message to remove redundant 'E-switch' Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a962d7a6 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Restrict metadata disablement to offloads mode Now that there is clear separation for acl setup/cleanup between legacy and offloads mode, limit metdata disablement to offloads mode. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
748da30b |
|
28-Oct-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-switch, Offloads shift ACL programming during enable/disable vport Currently legacy mode enables ACL while enabling vport, while offloads mode enable ACL when moving to offloads mode. Bring consistency to both modes by enabling/disabling ACL when enabling/disabling a vport. It also eliminates creating ingress ACL table on unused ECPF vport in offloads mode. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
925a6acc |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Prepare code to handle vport enable error In subsequent patch, esw_enable_vport() could fail and return error. Prepare code to handle such error. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
853b5352 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Move legacy drop counter and rule under legacy structure To improve code readability, move legacy drop counters and droup rule under legacy structure. While at it, (a) prefix drop flow counters helper with legacy_. (b) nullify the rule pointers only if they were valid. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d68316b5 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Move metdata fields under offloads structure Metadata fields are offload mode specific. To improve code readability, move metadata under offloads structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
99ecd646 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Correct comment for legacy fields fdb_table is used for both legacy and offloads mode. It was incorrect to comment that fdb_table is legacy specific. Hence, fix the comment to reflect that fdb_table is used in legacy and offloads mode. Fixes: 131ce7014043 ("net/mlx5: E-Switch, Remove redundant mc_promisc NULL check") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ea2300e0 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Introduce and use mlx5_esw_is_manager_vport() Currently esw_enable_vport() does vport check for zero to enable drop counters regardless of execution on ECPF/PF. While esw_disable_vport() considers such scenario. To keep consistency across code for checking for manager_vport, introduce and use mlx5_esw_is_manager_vport() to check if a specified vport is eswitch manager vport or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fdde49e0 |
|
28-Oct-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Introduce and use vlan rule config helper Between legacy mode and switchdev mode, only two fields are changed, vlan_tag and flow action. Hence to avoid duplicte code between two modes, introduce and and use helper function to configure allowed VLAN rule. While at it, get rid of duplicate debug message. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8463daf1 |
|
18-Aug-2019 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Add support to use SMFS in switchdev mode In case that flow steering mode of the driver is SMFS (Software Managed Flow Steering), then use the DR (SW steering) API to create the steering objects. In addition, add a call to the set peer namespace when switchdev gets devcom pair event. It is required to support VF LAG in SMFS. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2b688ea5 |
|
15-Aug-2019 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Add flow steering actions to fs_cmd shim layer Add flow steering actions: modify header and packet reformat to the fs_cmd shim layer. This allows each namespace to define possibly different functionality for alloc/dealloc action commands. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
61086f39 |
|
02-Aug-2019 |
Vlad Buslov <vladbu@mellanox.com> |
net/mlx5e: Protect encap hash table with mutex To remove dependency on rtnl lock, protect encap hash table from concurrent modifications with new "encap_tbl_lock" mutex. Use the mutex to protect internal encap entry state from concurrent modification. This is necessary because a flow can be attached to multiple encap entries simultaneously, which significantly complicates using finer grained per-entry lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
dd58edc3 |
|
01-Jun-2018 |
Vlad Buslov <vladbu@mellanox.com> |
net/mlx5e: Extend mod header entry with reference counter List of flows attached to mod header entry is used as implicit reference counter (mod header entry is deallocated when list becomes free) and as a mechanism to obtain mod header entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to mod header entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending mod header with reference counting, extract code that lookups and deletes mod header entry into standalone put/get helpers. In order to remove this dependency on external locking, extend mod header entry with reference counter to manage its lifetime and extend flow structure with direct pointer to mod header entry that flow is attached to. To remove code duplication between legacy and switchdev mode implementations that both support mod_hdr functionality, store mod_hdr table in dedicated structure used by both fdb and kernel namespaces. New table structure is extended with table lock by one of the following patches in this series. Implement helper function to get correct mod_hdr table depending on flow namespace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
93b3586e |
|
17-Jul-2019 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: Support inner header match criteria for non decap flow action We have an issue that OVS application creates an offloaded drop rule that drops VXLAN traffic with both inner and outer header match criteria. mlx5_core driver detects correctly the inner and outer header match criteria but does not enable the inner header match criteria due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that only decap rule needs inner header criteria. Solution: Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add two new members: inner_match_level and outer_match_level. inner/outer_match_level is set to NONE if the inner/outer match criteria is not specified in the tc rule creation request. The decap assumption is removed and the code just needs to check for inner/outer_match_level to enable the corresponding bit in firmware's match_criteria_enable value. Fixes: 6363651d6dd7 ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
fcb64c0f |
|
08-May-2019 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: E-Switch, add ingress rate support Use the scheduling elements to implement ingress rate limiter on an eswitch ports ingress traffic. Since the ingress of eswitch port is the egress of VF port, we control eswitch ingress by controlling VF egress. Configuration is done using the ports' representor net devices. Please note that burst size configuration is not supported by devices ConnectX-5 and earlier generations. Configuration examples: tc: tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k ovs: ovs-vsctl set interface eth0 ingress_policing_rate=1000 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5896b972 |
|
29-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Tide up eswitch config sequence Currently for PF and ECPF vports, representors are created before their eswitch hardware ports are initialized in below flow. mlx5_eswitch_enable() esw_offloads_init() esw_offloads_load_all_reps() [..] esw_enable_vport() However for VFs, vports are initialized before creating their respective netdev represnetors in event handling context. Similarly while disabling eswitch, first hardware vports are disabled, followed by destroying their representors. Here while underlying vports gets destroyed but its respective user facing netdevice can still exist on which user can continue to perform more offload operations. Instead, its more accurate to do enable_eswitch switchdev mode: 1. perform FDB tables initialization 2. initialize hw vport 3. create and publish representor for this vport disable_eswitch switchdev mode: 1. destroy user facing representor for the vport 2. disable hw vport 3. perform FDB tables cleanup Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
131ce701 |
|
29-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-Switch, Remove redundant mc_promisc NULL check mc_promisc pointer points to an instance of struct esw_mc_addr allocated as part of the esw structure. Hence it cannot be NULL. Removed such redundant check and assign where it is actually used. While at it, add comment around legacy mode fields and move mc_promisc close to other legacy mode structures to improve code redability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5019833d |
|
29-Jul-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-switch, Introduce helper function to enable/disable vports vports needs to be enabled in switchdev and legacy mode. In switchdev mode, vports should be enabled after initializing the FDB tables and before creating their represntors so that representor works on an initialized vport object. Prepare a helper function which can be called when enabling either of the eswitch modes. Similarly, have disable vports helper function. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
525e84be |
|
18-Nov-2018 |
Vlad Buslov <vladbu@mellanox.com> |
net/mlx5e: Eswitch, change offloads num_flows type to atomic64 Eswitch implements its own locking by means of state_lock mutex and multiple fine-grained lock in containing data structures, and is supposed to not rely on rtnl lock. However, eswitch offloads num_flows type is a regular long long integer and cannot be modified concurrently. This is an implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other means). In order to remove implicit dependency on rtnl lock, change num_flows type to atomic64 to allow concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
dd28087c |
|
07-Jun-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Refactor mlx5_esw_query_functions for modularity Functions change event output data size changes when functions other than VFs will be enabled in HCA CAP. With current API, multiple callers needs to align, calculate accurate size of the output data depending on number on non VF functions enabled in the device. Instead of duplicating such math at multiple places, refactor mlx5_esw_query_functions() to return raw output allocated by itself. Caller must free the allocated memory using kvfree() as described in the function comment section. This hides calcuation within mlx5_esw_query_functions() and provides simpler API. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
411ec9e0 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop When ECPF is the eswitch manager, host PF is treated like other VFs. Driver should do the same for inline mode and vlan pop. Add new iterators to include host PF if ECPF is the eswitch manager. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
16fff98a |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Reg/unreg function changed event at correct stage When driver is doing eswitch mode change, it's critical to keep number of enabled VFs unchanged. However, it can be changed on the fly once function changed event is registered. To remove this uncertainty, function changed event should not be registered before all setups, and first be unregistered before all cleanups. Wrap this functionality together with vport event handler. Fixes: 61fc880839e6 ("net/mlx5: E-Switch, Handle representors creation in handler context") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
062f4bf4 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Consolidate eswitch function number of VFs Enabled number of VFs is key for eswich manager to do flow steering initialization and vport configurations. However, the number of enabled VFs may come from two sources as below. PF: num of VFs is provided by enabled SR-IOV of itself. ECPF: num of VFs is provided by enabled SR-IOV from its peer PF. And SR-IOV can't be enabled from ECPF itself. Current driver handles the two cases in different stages and passing the number of enabled VFs among a large scope of internal functions. It is usually hard to find out where is the real number of VFs from due to layers of argument pass-in. This patch consolidated that number from the entry point of doing eswitch setup, and maintained a copy so that eswitch functions can refer to it directly. Eswitch driver shall always use this number when referring to enabled number of VFs, don't use other numbers such as from SR-IOV. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f6455de0 |
|
28-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Refactor eswitch SR-IOV interface Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
57843868 |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Add query and modify esw vport context functions Add esw vport query and modify functions, and exposing them is needed for enabling or disabling registers passed as metatdata to vport NIC_RX table in slow path. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7445cfb1 |
|
25-Jun-2019 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
91d6291c |
|
25-Jun-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Introduce a helper API to check VF vport Introduce a helper API mlx5_eswitch_is_vf_vport() to check if a given vport_num belongs to VF or not. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
98fdbea5 |
|
12-Jun-2019 |
Leon Romanovsky <leon@kernel.org> |
net/mlx5: Declare more strictly devlink encap mode Devlink has UAPI declaration for encap mode, so there is no need to be loose on the data get/set by drivers. Update call sites to use enum devlink_eswitch_encap_mode instead of plain u8. Suggested-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Petr Vorel <pvorel@suse.cz>
|
#
10ee82ce |
|
10-Jun-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Return raw output for query esw functions Current function only returns host num of VFs, later patch requires other params such as host maximum num of VFs. Return the raw output so that caller can extract info as needed. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10caabda |
|
18-Apr-2019 |
Oz Shlomo <ozsh@mellanox.com> |
net/mlx5e: Use termination table for VLAN push actions HW does not support push VLAN action in the RX direction (packets arriving from the wire). The FW works around this limitation by haripining the packet. The hairpin workaround applies only when the push VLAN action is specified in a termination table, assuring that there are no actions following the haripin. Instantiate termination table for push VLAN actions. Re-use identical terminating tables for increased HW cache efficiency. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8693115a |
|
29-May-2019 |
Parav Pandit <parav@mellanox.com> |
{IB,net}/mlx5: Constify rep ops functions pointers Currently for every representor type and for every single vport, representer function pointers copy is stored even though they don't change from one to other vport. Additionally priv data entry for the rep is not passed during registration, but its copied. It is used (set and cleared) by the user of the reps. As we want to scale vports, to simplify and also to split constants from data, 1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops prefix with other standard netdev, ibdev ops. 2. Constify the IB and Ethernet rep ops structure. 3. Instead of storing copy of all rep function pointers, store copy per eswitch rep type. 4. Split data and function pointers to mlx5_eswitch_rep_ops and mlx5_eswitch_rep_data. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6706a3b9 |
|
29-May-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Honor eswitch functions changed event cap Whenever device supports eswitch functions changed event, honor such device setting. Do not limit it to ECPF. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
cd56f929 |
|
29-May-2019 |
Vu Pham <vuhuong@mellanox.com> |
net/mlx5: E-Switch, Replace host_params event with functions_changed event To support sriov on a E-Switch manager, num_vfs are queried to the firmware whenever E-Switch manager is notified by esw_functions_changed event. Replace host_params event with esw_functions_changed event that reflects more appropriate naming. While at it, also correct num_vfs type from int to u16 as expected by the function mlx5_esw_query_functions(). Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
02f3afd9 |
|
05-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: E-Switch, Correct type to u16 for vport_num and int for vport_index To avoid any ambiguity between vport index and vport number, rename functions that had vport, to vport_num or vport_index appropriately. vport_num is u16 hence change mlx5_eswitch_index_to_vport_num() return type to u16. vport_index is an int in vport array. Hence change input type of vport index in mlx5_eswitch_index_to_vport_num() to int. Correct multiple eswitch representor interfaces use type u16 of rep->vport as type int vport_index. Send vport FW commands with correct eswitch u16 vport_num instead host int vport_index. Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5d9986a3 |
|
15-Apr-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Fix the check of legal vport The check of legal vport is to ensure the vport number falls between 0 and total number of vports. Along with the introduction of uplink rep, enabled vports are not consecutive any more. Therefore, rely on the eswitch vport getter function to check if it's a valid vport. As the getter function relies on eswitch, add the check of vport group manager and validation the presence of eswitch structure. Remove the redundant check in the function calls. Since the vport array will be allocated once eswitch is initialized and will be kept alive if eswitch presents, no need to protect it with the state lock. Fixes: 5ae5162066d8 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport") Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4314ebaa |
|
15-Apr-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Use getter to access all vport array Some functions issue vport commands and access vport array using vport_index/vport_num interchangeably which is OK for VFs vports. However, this creates potential bug if those vports are not VFs (E.g, uplink, sf) where their vport_index don't equal to vport_num. Prepare code to access mlx5_vport structure using a getter function. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
786ef904 |
|
20-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Reuse mlx5_esw_for_each_vf_vport macro in two files Currently mlx5_esw_for_each_vf_vport iterates over mlx5_vport entries in eswitch.c Same macro in eswitch_offloads.c iterates over vport number in eswitch_offloads.c Instead of duplicate macro names, to avoid confusion and to reuse the same macro in both files, move it to eswitch.h. To iterate over vport numbers where there is no need to iterate over mlx5_vport, but only a vport number is needed, rename those macros in eswitch_offloads.c to mlx5_esw_for_each_vf_num_vport*. While at it, keep all vport and vport rep iterators together. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
18486737 |
|
03-Mar-2019 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: ACLs for priority tag mode Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN push on RX path. As a workaround, untagged packets are tagged with VID 0x000 allowing pop/push actions to be exchanged with VLAN rewrite actions. Use the ingress ACL table, preceding the FDB, to push VLAN 0x000 ID tag for untagged packets and the egress ACL table, succeeding the FDB, to pop VLAN 0x000 ID tag. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
27b942fb |
|
29-Apr-2019 |
Parav Pandit <parav@mellanox.com> |
net/mlx5: Get rid of storing copy of device name Currently mlx5 core stores copy of the PCI device name in a mlx5_priv structure and uses pr_warn, pr_err helpers. Get rid of the copy of this name; instead store the parent device pointer that contains name as well as dma specific parameters. This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg device specific print routines. This is also a preparation patch to access non PCI parent device in future. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ee576ec1 |
|
21-Mar-2019 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5e: Fix compilation warning in en_tc.c Amazingly a mlx5e_tc function is being called from the eswitch layer, which is by itself very terrible! The function was declared locally in eswitch_offloads.c so it could be used there, which caused the following compilation warning, fix that. drivers/.../mlx5/core/en_tc.c:3242:6: [-Werror=missing-prototypes] error: no previous prototype for ‘mlx5e_tc_clean_fdb_peer_flows’ Fixes: 04de7dda7394 ("net/mlx5e: Infrastructure for duplicated offloading of TC flows") Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
544fe7c2 |
|
17-Feb-2019 |
Roi Dayan <roid@mellanox.com> |
net/mlx5e: Activate HW multipath and handle port affinity based on FIB events To support multipath offload we are going to track SW multipath route and related nexthops. To do that we register to FIB notifier and handle the route and next-hops events and reflect that as port affinity to HW. When there is a new multipath route entry that all next-hops are the ports of an HCA we will activate LAG in HW. Egress wise, we use HW LAG as the means to emulate multipath on current HW which doesn't support port selection based on xmit hash. In the presence of multiple VFs which use multiple SQs (send queues) this yields fairly good distribution. HA wise, HW LAG buys us the ability for a given RQ (receive queue) to receive traffic from both ports and for SQs to migrate xmitting over the active port if their base port fails. When the route entry is being updated to single path we will update the HW port affinity to use that port only. If a next-hop becomes dead we update the HW port affinity to the living port. When all next-hops are alive again we reset the affinity to default. Due to FW/HW limitations, when a route is deleted we are not disabling the HW LAG since doing so will not allow us to enable it again while VFs are bounded. Typically this is just a temporary state when a routing daemon removes dead routes and later adds them back as needed. This patch only handles events for AF_INET. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8da202b2 |
|
21-Jan-2019 |
Huy Nguyen <huyn@mellanox.com> |
net/mlx5: E-Switch, Add support for VEPA in legacy mode. In Virtual Ethernet Port Aggregator (VEPA) mode, the packet skips the system internal virtual switch and forwards to external network switch. In Mellanox HCA case, the virtual switch is the HCA's Eswitch. To support this, an new FDB flow table are created with level 0 and linked to the existing FDB flow table in legacy mode. By default, VEPA is turned off and this FDB flow table is empty. When VEPA is turned on, two rules are created. One rule to forward on uplink vport traffic to the legacy FDB. The other rule forward all other traffic to uplink vport. Other design alternatives were not chosen as explained below: 1. Create a forward rule in ACL flow table (most efficient design). This approach is the not chosen because firmware does not support forward rule to uplink vport (0xffff) for ACL flow table. 2. Add additional source port criteria in all the FDB rules to make the FDB rules to be received rules only. This approach is not chosen because it is not efficient as there can many rules in the FDB and VEPA mode cannot be controlled per vport. 3. Add a highest prioirty flow group in the existing legacy FDB Flow Table instead of a new flow table. This approoach does not work because the new flow group has the same match criteria as the promiscuous flow group and mlx5_add_flow_rules does not allow specifying flow group. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a3888f33 |
|
29-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Load/unload VF reps according to event from host PF When host PF changes the number of VFs, the ECPF esw driver will get a FW event. It should query the number of VFs enabled by host PF and update the VF reps accordingly. Note that host PF can't change the number of VFs dynamically, it has to reset the number of VFs to 0 before changing to a new positive number. The host event is registered when driver is moving to switchdev mode, and it's the last step to do in esw_offloads_init. It's unregistered and the work queue is flushed when driver quits from switchdev mode. In this way, the host event and devlink command are serialized. When driver is enabling switchdev mode, pay attention to the following two facts: 1. Host PF must not have VF initialized as the flow table in ECPF has ENCAP enabled as default. Such flow table can't be created with existing initialized VFs. 2. ECPF doesn't know how many VFs the host PF will enable, ECPF offloads flow steering shall create the flow table/groups based on the max number of VFs possibly supported by host PF. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
81cd229c |
|
10-Dec-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Consider ECPF vport depends on eswitch ownership ECPF connects to the eswitch through vport 0xfffe. ECPF may or may not be the eswitch manager depending on firmware configuration. 1. If ECPF is eswitch manager: ECPF will take over the eswitch manager responsibility. A rep of the host PF shall be created at the ECPF side for the eswitch manager to control. 2. If ECPF is not eswitch manager: host PF will be the eswitch manager, ECPF acts similar as a VF to the host PF. Host PF will be aware of the ECPF vport presence and control it's rep. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ae51620 |
|
14-Dec-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Assign a different position for uplink rep and vport In offloads mode, the current implementation puts the uplink representor at index zero of the vport reps array. It is not "natural" to place it at index 0 since we want to put the representor for vport 0 at index 0 with the introduction of SmartNIC. A separate patch will handle the case whether a rep is needed for vport 0 (PF vport). So, we want to have a different placeholder for uplink vport and representor. It was placed at the end of vport and rep array. Since vport number can no longer act as an index into the vport or representors arrays, use functions to map vport numbers to indices when accessing the vports or representors arrays, and vice versa. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c9b99abc |
|
31-Jan-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Split VF and special vports for offloads mode When driver is entering offloads mode, there are two major tasks to do: initialize flow steering and create representors. Flow steering should make sure enough flow table/group spaces are reserved for all reps. Representors will be created in a group, all or none. With the introduction of ECPF, flow steering should still reserve the same spaces. But, the representors are not always loaded/unloaded in a single piece. Once ECPF is in offloads mode, it will get the number of VF changing event from host PF. In such scenario, only the VF reps should be loaded/unloaded, not the reps for special vports (such as the uplink vport). Thus, when entering offloads mode, driver should specify the total number of reps, and the number of VF reps separately. When leaving offloads mode, the cleanup should use the information self-contained in eswitch such as number of VFs. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a1b3839a |
|
08-Nov-2018 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Properly refer to the esw manager vport In SmartNIC mode, the eswitch manager is not necessarily the PF (vport 0). Use a helper function to get the correct eswitch manager vport number and cache on the eswitch instance for fast reference. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b05af6aa |
|
12-Feb-2019 |
Bodong Wang <bodong@mellanox.com> |
net/mlx5: E-Switch, Normalize the name of uplink vport number Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to comply with the same naming convention along with the introduction of other vports. Use MLX5_VPORT as the prefix for such vports and relocate the uplink vport definition to public header file for the benefits of both net and IB drivers. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6363651d |
|
10-Jan-2019 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Properly set steering match levels for offloaded TC decap rules The match level computed by the driver gets to be wrong for decap rules with wildcarded inner packet match such as: tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789 action tunnel_key unset action mirred egress redirect dev eth1 The FW errs for a missing matching meta-data indicator for the outer headers (where we do have a match), and a wrong matching meta-data indicator for the inner headers (where we don't have a match). Fix that by taking into account the matching on the tunnel info and relating the match level of the encapsulated packet to the firmware inner headers indicator in case of decap. As for vxlan we mandate a match on the tunnel udp dst port, and in general we practically madndate a match on the source or dest ip for any IP tunnel, the fix was done in a minimal manner around the tunnel match parsing code. Fixes: d708f902989b ('net/mlx5e: Get the required HW match level while parsing TC flow matches') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
eff849b2 |
|
06-Jun-2018 |
Rabie Loulou <rabiel@mellanox.com> |
net/mlx5: Allow/disallow LAG according to pre-req only Remove the lag forbid/allow functions, change the lag prereq check to run in the do-bond logic, so every change in the prereq state will cause LAG to be disabled/enabled accordingly after the next do-bond run. Add lag update function, so every component which changes the prereq state and want the LAG to re-calc the conditions can call the update function. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f9392795 |
|
30-Nov-2017 |
Shahar Klein <shahark@mellanox.com> |
net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules Assign a counter dev attribute according to device capability and use it for management of counters related to offloaded eswitch TC flows. With upcoming support for uplink LAG, we have two HW rules per one logical SW (TC) rule. Although the HW supports attaching one counter to multiple rules, we are allocating counter per HW rule. We need this separation for two reasons: 1. "flow eswitch" counter affinity HW require the counter to be allocated on the device where the eswitch rule is set. 2. for some use-cases (multi-path routing) each HW flow relates to different neighbour, hence our neigh update logic must have a per-rule HW accountant in order to provide the proper feedback to the kernel. Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
04de7dda |
|
11-Nov-2018 |
Roi Dayan <roid@mellanox.com> |
net/mlx5e: Infrastructure for duplicated offloading of TC flows Under uplink LAG or multipath schemes, traffic that matches one flow might arrive on both uplink ports and transmitted through both as part of supporting aggregation and high-availability. To cope with the fact that the SW model might use logical SW port (e.g uplink team or bond) but we have two HW ports with e-switch on each, there are cases where in order to offload a SW TC rule we need to duplicate it to two HW flows. Since each HW rule has its own counter we also aggregate the counter of both rules when a flow stats query is executed from user-space. Introduce the changes for the different elements (add/delete/stats), currently nothing is duplicated. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
ac004b83 |
|
11-Nov-2018 |
Roi Dayan <roid@mellanox.com> |
net/mlx5e: E-Switch, Add peer miss rules In the sriov offloads mode, packets that are not matched by any other rule are sent towards the e-switch vport manager for further processing. Under upcoming patches (e.g for uplink LAG), packets sent from VF vports belonging to esw0 (e-switch related to PF0) might end up in esw1 (e-switch related to PF1) due to muxing logic applied by the FW. In such a case we still want the missed packet to be sent to the "base" esw manager vport in order to present the control plane a consistent view of the source (VF reresentor) port. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Aviv Heller <avivh@mellanox.com> Signed-off-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
8c4dc42b |
|
18-Nov-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Support multiple encapsulations for a TC flow Currently a flow is associated with a single encap structure. The FW extended destination features enables the driver to associate a flow with multiple encap instances. Change the encap id field from a flow scope to a per destination value in the flow attributes struct. Use the encaps array to associate a flow table entry with multiple encap entries. Update the neigh logic to offload only if all encapsulations used in a flow are connected, and un-offload upon the first one disconnected. Note that the driver can now support up to two encap destinations. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f493f155 |
|
01-Dec-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Move flow attr reformat action bit to per dest flags Flow attr reformat action bit is moved from the global action bits to a per destination flags field, as a pre-step for adding additional flags to support encapsulation properties per destination, with no functionality change. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
df65a573 |
|
01-Dec-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5e: Refactor eswitch flow attr for destination specific properties Currently the eswitch flow attr structure stores each destination specific property in its own specific array. Group them in an array of destination structures as a pre-step towards adding additional destination specific field properties. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e85e02ba |
|
23-Nov-2018 |
Eli Britstein <elibr@mellanox.com> |
net/mlx5: E-Switch, Rename esw attr mirror count field The mirror count esw attributes field is used to determine if splitting the rule to two FTEs is required while programming e-switch mirroring. Rename it to split count, making it clearer with no functional change. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6933a937 |
|
20-Nov-2018 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch, Use async events chain Remove the explicit call to mlx5_eswitch_vport_event on MLX5_EVENT_TYPE_NIC_VPORT_CHANGE and let the eswitch register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c92a0b94 |
|
04-Sep-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Enable setting goto slow path chain action A pre-step for the tc offloads code to use this when a neigh is not available for encap rules. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e52c2802 |
|
02-Jul-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: E-Switch, Add chains and priorities A chain is a group of priorities, so use the fdb parallel sub namespaces to implement chains, and a flow table for each priority in them. Because these namespaces are parallel and in series to the slow path fdb, the chains aren't connected to one another (but to the slow path), and one must use a explicit goto action to reach a different chain. Flow tables for the priorities will be created on demand and destroyed once not used. The Firmware has four pools of tables for sizes S/XS/M/L (4k, 64k, 1m, 4m). We maintain ghost copies of the pools occupancy. When a new table is to be created, we scan the pools from large to small and find the 1st table size which can be now created. When a table is destroyed, we update the relevant pool. Multi chain/prio isn't enabled yet by this patch, for now all flows will use the default chain 0, and prio 1. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
48265006 |
|
20-Sep-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Have explicit API to delete fwd rules Be symmetric with the e-switch API to add rules which has a specific function to add fwd rules which are used as part of vport mirroring. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
328edb49 |
|
03-Jul-2018 |
Paul Blakey <paulb@mellanox.com> |
net/mlx5: Split FDB fast path prio to multiple namespaces Towards supporting multi-chains and priorities, split the FDB fast path to multiple namespaces (sub namespaces), each with multiple priorities. This patch adds a new flow steering type, FS_TYPE_PRIO_CHAINS, which is like current FS_TYPE_PRIO, but may contain only namespaces, and those will be in parallel to one another in terms of managing of the flow tables connections inside them. Meaning, while searching for the next or previous flow table to connect for a new table inside such namespace we skip the parallel namespaces in the same level under the FS_TYPE_PRIO_CHAINS prio we originated from. We use this new type for splitting the fast path prio into multiple parallel namespaces, each containing normal prios. The prios inside them (and their tables) will be connected to one another, but not from one parallel namespace to another, instead the last prio in each namespace will be connected to the next prio in the containing FDB namespace, which is the slow path prio. Signed-off-by: Paul Blakey <paulb@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b8aee822 |
|
02-Oct-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Get counters for offloaded flows from callers There's no real reason for the e-switch logic to manage the creation of counters for offloaded flows. The API already has the directive for the caller to denote they want to attach a counter to the created flow. As such, we go and move the management of flow counters to the mlx5e tc offload logic. This also lets us remove an inelegant interface where the FS layer had to provide a way to retrieve a counter from a flow rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
db7ff19e |
|
15-Aug-2018 |
Eli Britstein <elibr@mellanox.com> |
devlink: Add extack for eswitch operations Add extack argument to the eswitch related operations. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c966f7d5 |
|
16-Aug-2018 |
Gavi Teitz <gavi@mellanox.com> |
net/mlx5: E-Switch, Provide flow dest when creating vport rx rule Currently the destination for the representor e-switch rx rule is a TIR number. Towards changing that to potentially be a flow table, as part of enabling RSS for representors, modify the signature of the related e-switch API to get a flow destination. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
cc495188 |
|
25-Apr-2018 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: Support offloading double vlan push/pop tc actions As we can configure two push/pop actions in one flow table entry, add support to offload those double vlan actions in a rule to HW. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
1482bd3d |
|
02-Jul-2018 |
Jianbo Liu <jianbol@mellanox.com> |
net/mlx5e: Refactor tc vlan push/pop actions offloading Extract actions offloading code to a new function, and also extend data structures for double vlan actions. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e4ad91f2 |
|
16-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5e: Split offloaded eswitch TC rules for port mirroring If a TC rule needs to be split for mirroring, create two HW rules, in the first level and the second level flow tables accordingly. In the first level flow table, forward the packet to the mirror port and forward the packet to the second level flow table for further processing, eg. encap, vlan push or header re-write. Currently the matching is repeated in both stages. While here, simplify the setup of the vhca id valid indicator also in the existing code. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
592d3651 |
|
03-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5e: Parse mirroring action for offloaded TC eswitch flows Currently, we only support the mirred redirect TC sub-action. In order to support flow based vport mirroring, add support to parse the mirred mirror sub-action. For mirroring, user-space will typically set the action order such that the mirror port (mirror VF) sees packets as the original port (VF under mirroring) sent them or as it will receive them. In the general case, it means that packets are potentially sent to the mirror port before or after some actions were applied on them. To properly do that, we should follow on the exact action order as set for the flow and make sure this will also be the case when we program the HW offload. We introduce a counter for the output ports (attr->out_count), which we increase when parsing each mirred redirect/mirror sub-action and when dealing with encap. We introduce a counter (attr->mirror_count) telling us if split is needed. If no split is needed and mirroring is just multicasting to vport, the mirror count is zero, all the actions of the TC flow should apply on that single HW flow. If split is needed, the mirror count tells where to do the split, all non-mirred tc actions should apply only after the split. The mirror count is set while parsing the following actions encap/decap, header re-write, vlan push/pop. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a842dd04 |
|
16-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5: E-switch, Create a second level FDB flow table If firmware supports the forward action with a destination list that includes a flow table, create a second level FDB flow table. This is going to be used for flow based mirroring under the switchdev offloads mode. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
52fff327 |
|
16-May-2018 |
Chris Mi <chrism@mellanox.com> |
net/mlx5: E-Switch, Reorganize and rename fdb flow tables We have several fdb flow tables for each of the legacy and switchdev modes. In the switchdev mode, there are fast path and slow path flow tables. Towards adding more flow tables in upcoming patches, reorganize and rename the various existing ones to reflect their functionality. Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
10ff5359 |
|
18-Mar-2018 |
Shahar Klein <shahark@mellanox.com> |
net/mlx5e: Explicitly set source e-switch in offloaded TC rules Set a specific source e-switch when setting a rule that matches on the ingress port. Signed-off-by: Shahar Klein <shahark@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
56e858df |
|
18-Mar-2018 |
Rabie Loulou <rabiel@mellanox.com> |
net/mlx5e: Explicitly set destination e-switch in FDB rules Set a specific destination e-switch when setting a destination vport. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
38aa51c1 |
|
05-Apr-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Support offloaded TC flows with no matches on headers For example: tc filter add dev ens2f0_0 parent ffff: flower skip_sw action drop Note that for eswitch flows, we still always match on the source port. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d708f902 |
|
05-Apr-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Get the required HW match level while parsing TC flow matches Introduce levels of matching on headers of offloaded flows (none, L2, L3, L4) that follow the inline mode levels. This is pre-step for us to offload flows without any matches on headers. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
6acfbf38 |
|
31-Jan-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Offload tc vlan push/pop using HW action Currently, we are emulating the offload of vlan push/pop actions using global setup as done by commit f5f82476090f ("net/mlx5: E-Switch, Support VLAN actions in the offloads mode"). With newer NICs, we can apply a flow action for that matter, do that while keeping the emulated path for the older HW brands. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0c06897a |
|
28-Jan-2018 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Add core support for vlan push/pop steering action Newer NICs (ConnectX-5 and onward) can apply vlan pop or push as an action taking place during flow steering. Add the core bits for that. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f80be543 |
|
30-Jan-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Optimize HW steering tables in switchdev mode Under switchdev mode we insert an eswitch miss rule causing any unmatched traffic to be sent towards the PF vport. This miss rule can be optimized if we break it to two, one case is for multicast traffic and the other for unicast. Breaking the miss rule into two (unicast and multicast) allows the firmware to program the hardware in a more efficient way. Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors): IXIA -> NIC -> PF -> IB representor -> NIC -> VF: - Without this optimization: 9.2 MPPS. - With this optimization: 18 MPPS. VF -> NIC -> IB representor-> PF -> NIC -> IXIA: - Without this optimization: 17 MPPS. - With this optimization: 23.4 MPPS. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
57cbd893 |
|
16-Jan-2018 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Move representors definition to a global scope In preparation for IB representors, move representors structs to a global scope, also expose functions needed for registration, unregistration, eswitch mode and creating a flow rule to direct traffic from SQs to the right VF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
22215908 |
|
27-Sep-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Add callback to get representor device Add a callback interface to get a protocol device (per representor type). The Ethernet representors will expose their netdev via this interface. This functionality can be later used by IB representor in order to find the corresponding net device representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
b8a0dbe3 |
|
08-Nov-2017 |
Eugenia Emantayev <eugenia@mellanox.com> |
net/mlx5e: E-switch, Add steering drop counters Add flow counters to count packets dropped due to drop rules configured in eswitch egress and ingress ACLs. These counters will count VFs violations and incoming traffic drops. Will be presented on hypervisor via standard 'ip -s link show' command. Example: "ip -s link show dev enp5s0f0" 6: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff RX: bytes packets errors dropped overrun mcast 0 0 0 0 0 2 TX: bytes packets errors dropped carrier collsns 1406 17 0 0 0 0 vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off RX: bytes packets mcast bcast dropped 1666 29 14 32 0 TX: bytes packets dropped 2880 44 2412 Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2c47bf80 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5e: E-Switch, Move send-to-vport rule struct to en_rep Move struct mlx5_esw_sq which keeps send-to-vport rule to from the eswitch code to mlx5e and rename it to better reflect where it belongs Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
a4b97ab4 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Create generic header struct to be used by representors Now that we don't store type dependent data in struct mlx5_eswitch_rep we can create a generic interface, and representor type. struct mlx5_eswitch_rep will store an array of interfaces, each interface is used by a different representor type. Once we moved to a more generic interface, rdma driver representors can be added and utilize the same mechanism as the Ethernet driver representors use. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
5ed99fb4 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5e: Move ethernet representors data into separate struct Ethernet representors have a need to store data which is applicable only for them. Create a priv void pointer in struct mlx5_eswitch_rep and move mlx5e to store the relevant data there. As part of this change we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the E-Switch code will copy a priv value which is garbage. We also rename mlx5_eswitch_get_uplink_netdev() to mlx5_eswitch_get_uplink_priv() and make it return void *. This way E-Switch code doesn't need to deal with net devices and we leave the task of getting it to mlx5e. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
159fe639 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Create a dedicated send to vport rule deletion function In order for representors to send packets directly to VFs we use an E-Switch function which insert special rules into the HW. For symmetry create an E-Switch function that deletes these rules as well. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
f7a68945 |
|
07-Dec-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch In our pursuit to cleanup e-switch sub-module from mlx5e specific code, we move the functions that insert/remove the flow steering rules that allow mlx5e representors to send packets directly to VFs into the EN driver code. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
4c66df01 |
|
24-Aug-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Simplify representor load/unload callback API In the load() callback for loading representors we don't really need struct mlx5_eswitch but struct mlx5_core_dev, pass it directly. In the unload() callback for unloading representors we don't need the struct mlx5_eswitch argument, remove it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e8d31c4d |
|
09-Aug-2017 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: E-Switch, Refactor vport representors initialization Refactor the init stage of vport representors registration. vport number and hw id can be assigned by the E-Switch driver and not by the netdevice driver. While here, make the error path of mlx5_eswitch_init() a reverse order of the good path, also use kcalloc to allocate an array instead of kzalloc. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
e80541ec |
|
05-Jun-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig Allow to selectively build the driver with or without sriov eswitch, VF representors and TC offloads. Also remove the need of two ndo ops structures (sriov & basic) and keep only one unified ndo ops, compile out VF SRIOV ndos when not needed (MLX5_ESWITCH=n), and for VF netdev calling those ndos will result in returning -EPERM. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
|
#
eeb66cdb |
|
04-Jun-2017 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Separate between E-Switch and MPFS Multi-Physical Function Switch (MPFs) is required for when multi-PF configuration is enabled to allow passing user configured unicast MAC addresses to the requesting PF. Before this patch eswitch.c used to manage the HW MPFS l2 table, E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's contexts update on unicast mac address list changes, to populate the PF's MPFS L2 table accordingly. In downstream patch we would like to allow compiling the driver without E-Switch functionalities, for that we move MPFS l2 table logic out of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to allow compiling out MPFS for those who don't want Multi-PF support. NIC PF netdevice will now directly update MPFS l2 table via the new MPFS API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain responsible of updating its MPFS l2 table on behalf of its VFs. Due to this change we also don't require enabling vport(0) (PF vport) unicast mac changes events anymore, for when SRIOV is not enabled. Which means E-Switch is now activated only on SRIOV activation, and not required otherwise. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
|
#
11c9c548 |
|
04-May-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Add cache for HW modify header IDs Packets belonging to flows which are different by matching may still need to go through the same header re-write. Add a cache for header re-write IDs keyed by the binary chain of modify header actions. The caching is supported for both eswitch and NIC use-cases, where the actual conversion of the code to use caching comes in next patches, one per use-case. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
0a0ab1d2 |
|
28-Feb-2017 |
Eli Cohen <eli@mellanox.com> |
net/mlx5: E-Switch, Avoid redundant memory allocation struct esw_mc_addr is a small struct that can be part of struct mlx5_eswitch. Define it as a field and not as a pointer and save the kzalloc call and then error flow handling. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
232c0013 |
|
19-Mar-2017 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Add support to neighbour update flow In order to offload TC encap rules, the driver does a lookup for the IP tunnel neighbour according to the output device and the destination IP given by the user. To keep tracking after the validity state of such neighbours, we keep the neighbours information (pair of device pointer and destination IP) in a hash table maintained at the relevant egress representor and register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update netevent, we search for a match among the cached neighbours entries used for encapsulation. In case the neighbour isn't valid, we can't offload the flow into the HW. We cache the flow (requested matching and actions) in the driver and offload the rule later, when the neighbour is resolved and becomes valid. When a flow is only cached in the driver and not offloaded into HW yet, we use EAGAIN return value to mark it internally, the TC ndo still returns success. Listen to kernel neighbour update netevents to trace relevant neighbours validity state: 1. If a neighbour becomes valid, offload the related rules to HW. 2. If the neighbour becomes invalid, remove the related rules from HW. 3. If the neighbour mac address was changed, update the encap header. Remove all the offloaded rules using the old encap header from the HW and insert new rules to HW with updated encap header. Access to the neighbors hash table is protected by RTNL lock of its caller or by the table's spinlock. Details of the locking/synchronization among the different actions applied on the neighbour table: Add/remove operations - protected by RTNL lock of its caller (all TC commands are protected by RTNL lock). Add and remove operations are initiated only when the user inserts/removes a TC rule into/from the driver. Lookup/remove operations - since the lookup operation is done from netevent notifier block, RTNL lock can't be used (atomic context). Use the table's spin lock to protect lookups from TC user removal operation. bh is used since netevent can be called from a softirq context. Lookup/add operations - The hash table access functions are taking care of the protection between lookup and add operations. When adding/removing encap headers and rules to/from the HW, RTNL lock is used. It can happen when: 1. The user inserts/removes a TC rule into/from the driver (TC commands are protected by RTNL lock of it's caller). 2. The driver gets neighbour notification event, which reports about neighbour validity status change. Before adding/removing encap headers and rules to/from the HW, RTNL lock is taken. A neighbour hash table entry should be freed when its encap list is empty. Since The neighbour update netevent notification schedules a neighbour update work that uses the neighbour hash entry, it can't be freed unconditionally when the encap list becomes empty during TC delete rule flow. Use reference count to protect from freeing neighbour hash table entry while it's still in use. When the user asks to unregister a netdvice used by one of the neigbours, neighbour removal notification is received. Then we take a reference on the neighbour and don't free it until the relevant encap entries (and flows) are marked as invalid (not offloaded) and removed from HW. As long as the encap entry is still valid (checked under RTNL lock) we can safely access the neighbour device saved on mlx5e_neigh struct. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
c1ae1152 |
|
25-Apr-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Move the encap entry structure from the eswitch header The encap entry structure isn't manipulated by the eswitch code, hence it can/needs to be removed from the eswitch header. Do that, and change it to have mlx5e_ prefix. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
45247bf2 |
|
25-Apr-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Remove encap entry pointer from the eswitch flow attributes Encap wise, the tc eswitch flow attribute struct needs to have only the encap ID which is programmed later to the HW and none of the higher level encap params, fix that. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
7768d197 |
|
25-Sep-2016 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Add control for encapsulation Implement the devlink e-switch encapsulation control set and get callbacks. Apply the value set by the user on the switchdev offloads mode when creating the fast FDB table where offloaded rules will be set. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
d7e75a32 |
|
25-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions This includes calling the parsing code that translates from pedit speak to the HW API, allocation (deallocation) of a modify header context and setting the modify header id associated with this context to the FTE of that flow. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
2a69cb9f |
|
19-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Introduce modify header structures, commands and steering action definitions Add the definitions related to creation/deletion of a modify header context and the modify header steering action which are used for HW packet header modify (re-write) as part of steering. Add as well the modify header id into two intermediate structs and set it to the FTE. Note that as the push/pop vlan steering actions are emulated by the ewitch management code, we're not breaking any compatibility while changing their values to make room for the modify header action which is not emulated and whose value is part of the FW API. The new bit values for the emulated actions are at the end of the possible range. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
375f51e2 |
|
21-Mar-2017 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured Changing the eswitch inline mode can potentially cause already configured flows not to match the policy. E.g. set policy L4, add some L4 rules, set policy to L2 --> bad! Hence we disallow it. Keep track of how many offloaded rules are now set and refuse inline mode changes if this isn't zero. Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
d85cdccb |
|
21-Mar-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch offloading use case, and push the latter into the e-switch code. This provides better separation and is to be used in down-stream patch for applying a fix. Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c9497c98 |
|
15-Dec-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Add support for setting VF min rate Add support for SRIOV VF min rate guarantee by using the TSAR BW share weights mechanism. The TSAR BW share vport attribute represents the weight of that vport among the other vports weights which means that the actual vport BW percentage is the same vport weight percentage among the total vports weights sum. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
76f7444d |
|
05-Jan-2017 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5e: Use the full tunnel key info for encapsulation offload house-keeping Currently we use subset of the input tunnel key fields (id, ip daddr, dst port) which are provided by upper layers to indentify flows that should go through the same encapsulation and maintain the HW encapsulation table. This is redundant and can get us wrong. Instead, keep a copy of the ip tunnel info provided by the user through TC and have the tunnel key part as the key to our internal hash. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
#
726293f1 |
|
01-Dec-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Save the represntor netdevice as part of the representor Replace the representor private data to a net_device pointer holding the representor netdevice, instead of void pointer holding mlx5e_priv. It will be used by a new eswitch service function, returning the uplink representor netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bffaa916 |
|
22-Nov-2016 |
Roi Dayan <roid@mellanox.com> |
net/mlx5: E-Switch, Add control for inline mode Implement devlink show and set of HW inline-mode. The supported modes: none, link, network, transport. We currently support one mode for all vports so set is done on all vports. When eswitch is first initialized the inline-mode is queried from the FW. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a54e20b4 |
|
07-Nov-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Add basic TC tunnel set action for SRIOV offloads In mlx5 HW, encapsulation is offloaded by the steering rule having index into an encapsulation table containing the entire set of headers to be added by the HW. The driver sets these headers in a buffer when we are offloading the action. The code maintains mlx5_encap_entry for each encap header it has encountered when attempted to offload TC tunnel set action. This entry maintains a linked list of all the flows sharing the same encap header, when the last flow is removed from the list the encap entry is removed. The actual encap_header is allocated by the driver in the hardware only if we have layer two neighbour info when the encap entry is created. While the flow is in the driver, the driver holds a reference on the neighbour. When a new flow with encap action is inserted, the code first checks if the required encap entry exists according to the tunnel set parameters. If it does the encap is shared, otherwise a new mlx5_encap_entry is created. TC action parsing implementation in the driver assumes that tunnel set action is provided in the same order set by the user, e.g before the mirred_redirect action. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
74491de9 |
|
31-Aug-2016 |
Mark Bloch <markb@mellanox.com> |
net/mlx5: Add multi dest support Currently when calling mlx5_add_flow_rule we accept only one flow destination, this commit allows to pass multiple destinations. This change forces us to change the return structure to a more flexible one. We introduce a flow handle (struct mlx5_flow_handle), it holds internally the number for rules created and holds an array where each cell points the to a flow rule. From the consumers (of mlx5_add_flow_rule) point of view this change is only cosmetic and requires only to change the type of the returned value they store. From the core point of view, we now need to use a loop when allocating and deleting rules (e.g given to us a flow handler). Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
bd77bf1c |
|
11-Aug-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Add SRIOV VF max rate configuration support Implement the vf set rate ndo by modifying the TSAR vport rate limit. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
1bd27b11 |
|
11-Aug-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Introduce E-switch QoS management Add TSAR to the eswitch which will act as the vports rate limiter. Create/Destroy TSAR on Enable/Dsiable SRIOV. Attach/Detach vport to eswitch TSAR on Enable/Disable vport. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
#
f5f82476 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Support VLAN actions in the offloads mode Many virtualization systems use a policy under which a vlan tag is pushed to packets sent by guests, and popped before the packet is forwarded to the VM. The current generation of the mlx5 HW doesn't fully support that on a per flow level. As such, we are addressing the above common use case with the SRIOV e-Switch abilities to push vlan into packets sent by VFs and pop vlan from packets forwarded to VFs. The HW can match on the correct vlan being present in packets forwarded to VFs (eSwitch steering is done before stripping the tag), so this part is offloaded as is. A common practice for vlans is to avoid both push vlan and pop vlan for inter-host VM/VM (east-west) communication because in this case, push on egress cancels out with pop on ingress. For supporting that, we use a global eswitch vlan pop policy, hence allowing guest A to communicate with both remote VM B and local VM C. This works since the HW pops the vlan only if it exists (e.g for C --> A packets but not for B --> A packets). On the slow path, when a VF vport has an offloaded flow which involves pushing vlans, wheres another flow is not currently offloaded, the packets from the 2nd flow seen by the VF representor on the host have vlan. The VF rep driver removes such vlan before calling into the host networking stack. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
776b12b6 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Put elements related to offloaded TC rule in one struct Put the representors related to the source and dest vports and the action in struct mlx5_esw_flow_attr which is used while setting the FDB rule. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
e33dfe31 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan The HW can be programmed to push vlan, pop vlan or both. A factorization step towards using the push/pop capabilties in the eswitch offloads mode. This patch doesn't add new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
bac9b6aa |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Set vport representor fields explicitly on registration The structure we use for the eswitch vport representor (mlx5_eswitch_rep) has some fields which are set from upper layers in the driver when they register the rep. Use explicit setting on registration time for them and avoid global memcpy. This patch doesn't add new functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9deb2241 |
|
22-Sep-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Set the vport when registering the uplink rep Set the vport value in the PF entry to be that of the uplink so we can use it blindly over the tc / eswitch offload code without translating it each time we deal with the uplink representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
766a0e97 |
|
18-Sep-2016 |
Baoyou Xie <baoyou.xie@linaro.org> |
net/mlx5: clean function declarations in eswitch.c up We get 2 warnings when building kernel with W=1: drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:463:5: warning: no previous prototype for 'esw_offloads_init' [-Wmissing-prototypes] drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:521:6: warning: no previous prototype for 'esw_offloads_cleanup' [-Wmissing-prototypes] In fact, both functions are declared in drivers/net/ethernet/mellanox/mlx5/core/eswitch.c,but should be declared in a header file, thus can be recognized in other file. So this patch moves the declarations into drivers/net/ethernet/mellanox/mlx5/core/eswitch.h Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1ab2068a |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Implement vports admin state backup/restore Save the user configuration in the vport sturct. Restore saved old configuration upon vport enable. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
62a9b90a |
|
09-Sep-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: Implement eswitch attach/detach flows Needed for lightweight and modular internal/pci error handling. Implement eswitch attach function which allocates/starts hw related resources. Implement eswitch detach function which releases/stops hw related resources. Init/cleanup function only handle eswitch software context allocation and destruction. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dbe413e3 |
|
18-Aug-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Retrieve the switchdev id from the firmware only once Avoid firmware command execution each time the switchdev HW ID attr get call is made. We do that by reading the ID (PF NIC MAC) only once at load time and store it on the representor structure. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3d80d1a2 |
|
14-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add API to configure rules for the offloaded mode This allows for upper levels in the driver, e.g the TC offload code to add e-switch offloaded steering rules. The caller provides the rule spec for matching, action, source and destination vports. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1033665e |
|
14-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Use two priorities for SRIOV offloads mode In the offloads mode, some slow path rules are added by the driver (e.g send-to-vport), while offloaded rules are to be added from upper layers. The slow path rules have lower priority and we don't want matching on offloaded rules to suffer from extra steering hops related to the slow path rules. We use two priorities, one for offloaded rules (fast path), and one for the control rules (slow path). To allow for that, we enable two priorities for the FDB namespace in the FS core code. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
cb67b832 |
|
01-Jul-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5e: Introduce SRIOV VF representors Implement the relevant profile functions to create mlx5e driver instance serving as VF representor. When SRIOV offloads mode is enabled, each VF will have a representor netdevice instance on the host. To do that, we also export set of shared service functions from en_main.c, such that they can be used by both NIC and repsresentors netdevs. The newly created representor netdevice has a basic set of net_device_ops which are the same ndo functions as the NIC netdevice and an ndo of it's own for phys port name. The profiling infrastructure allow sharing code between the NIC and the vport representor even though the representor has only a subset of the NIC functionality. The VF reps and the PF which is used in that mode to represent the uplink, expose switchdev ops. Currently the only op supposed is attr get for the port parent ID which here serves to identify net-devices belonging to the same HW E-Switch. Other than that, no offloading is implemented and hence switching functionality is achieved if one sets SW switching rules, e.g using tc, bridge or ovs. Port phys name (ndo_get_phys_port_name) is implemented to allow exporting to user-space the VF vport number and along with the switchdev port parent id (phys_switch_id) enable a udev base consistent naming scheme: SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \ ATTR{phys_port_name}!="", NAME="$PF_NIC$attr{phys_port_name}" where phys_switch_id is exposed by the PF (and VF reps) and $PF_NIC is the name of the PF netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
127ea380 |
|
01-Jul-2016 |
Hadar Hen Zion <hadarh@mellanox.com> |
net/mlx5: Add Representors registration API Introduce E-Switch registration/unregister representors functions. Those functions are called by the mlx5e driver when the PF NIC is created upon pci probe action regardless of the E-Switch mode (NONE, LEGACY or OFFLOADS). Adding basic E-Switch database that will hold the vport represntors upon creation. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
feae9087 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: Add devlink interface The devlink interface is initially used to set/get the mode of the SRIOV e-switch. Currently, these are only stubs for get/set, down-stream patch will actually fill them out. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
fed9ce22 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add API to create vport rx rules Add the API to create vport rx rules of the form packet meta-data :: vport == $VPORT --> $TIR where the TIR is opened by this VF representor. This logic will by used for packets that didn't match any rule in the e-switch datapath and should be received into the host OS through the netdevice that represents the VF they were sent from. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
c116c6ee |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add offloads table Belongs to the NIC offloads name-space, and to be used as part of the SRIOV offloads logic to steer packets that hit the e-switch miss rule to the TIR of the relevant VF representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
ab22be9b |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add API to create send-to-vport rules Add the API to create send-to-vport e-switch rules of the form packet meta-data :: send-queue-number == $SQN and source-vport == 0 --> $VPORT These rules are to be used for a send-to-vport logic which conceptually bypasses the "normal" steering rules currently present at the e-switch datapath. Such rule should apply only for packets that originate in the e-switch manager vport (0) and are sent for a given SQN which is used by a given VF representor device, and hence the matching logic. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3aa33572 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add miss rule for offloads mode In the sriov offloads mode, packets that are not matched by any other rule should be sent towards the e-switch manager for further processing. Add such "miss" rule which matches ANY packet as the last rule in the e-switch FDB and programs the HW to send the packet to vport 0 where the e-switch manager runs. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
69697b6e |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add support for the sriov offloads mode Unlike the legacy mode, here, forwarding rules are not learned by the driver per events on macs set by VFs/VMs into their vports, but rather should be programmed by higher-level SW entities. Saying that, still, in the offloads mode (SRIOV_OFFLOADS), two flow groups are created by the driver for management (slow path) purposes: The first group will be used for sending packets over e-switch vports from the host OS where the e-switch management code runs, to be received by VFs. The second group will be used by a miss rule which forwards packets toward the e-switch manager. Further logic will trap these packets such that the receiving net-device as seen by the networking stack is the representor of the vport that sent the packet over the e-switch data-path. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
6ab36e35 |
|
01-Jul-2016 |
Or Gerlitz <ogerlitz@mellanox.com> |
net/mlx5: E-Switch, Add operational mode to the SRIOV e-Switch Define three modes for the SRIOV e-switch operation, none (SRIOV_NONE, none of the VF vports are enabled), legacy (SRIOV_LEGACY, the current mode) and sriov offloads (SRIOV_OFFLOADS). Currently, when in SRIOV, only the legacy mode is supported, where steering rules are of the form: destination mac --> VF vport This patch does not change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
1edc57e2 |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Implement trust vf ndo - Add support to configure trusted vf attribute through trust_vf_ndo. - Upon VF trust setting change we update vport context to refresh allmulti/promisc or any trusted vf attributes that we didn't trust the VF for before. - Lock the eswitch state lock on vport event in order to synchronise the vport context updates , this will prevent contention with vport trust setting change which will trigger vport mac list update. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
a35f71f2 |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling Add promisc_change as a trigger to vport context change event. Add set vport promisc/allmulti functions to add vport to promiscuous flowtable rules. Upon promisc/allmulti rx mode vf request add the vport to the relevant promiscuous group (Allmulti/Promisc group) so the relevant traffic will be forwarded to it. Upon allmulti vf request add the vport to each existing multicast fdb rule. Upon adding/removing mcast address from a vport, update all other allmulti vports. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
78a9199b |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Add promiscuous and allmulti FDB flowtable groups Add promiscuous and allmulti steering groups in FDB table. Besides the full match L2 steering rules group, we added two more groups to catch the "miss" rules traffic: * Allmulti group: One rule that forwards any mcast traffic coming from either uplink or VFs/PF vports * Promisc group: One rule that forwards all unmatched traffic coming from uplink. Needed for downstream privileged VF promisc and allmulti support. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
f942380c |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk Configure ingress and egress vport ACL rules according to spoofchk admin parameters. Ingress ACL flow table rules: if (!spoofchk && !vst) allow all traffic. else : 1) one of the following rules : * if (spoofchk && vst) allow only untagged traffic with smac=original mac sent from the VF. * if (spoofchk && !vst) allow only traffic with smac=original mac sent from the VF. * if (!spoofchk && vst) allow only untagged traffic. 2) drop all traffic that didn't hit #1. Add support for set vf spoofchk ndo. Add non zero mac validation in case of spoofchk to set mac ndo: when setting new mac we need to validate that the new mac is not zero while the spoofchk is on because it is illegal combination. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
dfcb1ed3 |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode Configure ingress and egress vport ACL rules according to vlan and qos admin parameters. Ingress ACL flow table rules: 1) drop any tagged packet sent from the VF 2) allow other traffic (default behavior) Egress ACL flow table rules: 1) allow only tagged traffic with vlan_tag=vst_vid. 2) drop other traffic. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
5742df0f |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Introduce VST vport ingress/egress ACLs Create egress/ingress ACLs per VF vport at vport enable. Ingress ACL: - one flow group to drop all tagged traffic in VST mode. Egress ACL: - one flow group that allows only untagged traffic with smac that is equals to the original mac (anti-spoofing). - one flow group that allows only untagged traffic. - one flow group that allows only smac that is equals to the original mac (anti-spoofing). (note: only one of the above group has active rule) - star rule will be used to drop all other traffic. By default no rules are generated, unless VST is explicitly requested. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
831cae1d |
|
03-May-2016 |
Mohamad Haj Yahia <mohamad@mellanox.com> |
net/mlx5: E-Switch, Replace vport spin lock with synchronize_irq() Vport spin lock can be replaced with synchronize_irq() in the right place, this will remove the need of locking inside irq context. Locking in esw_enable_vport is not required since vport events are yet to be enabled, and at esw_disable_vport it is sufficient to synchronize_irq() to guarantee no further vport events handlers will be scheduled. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
86d722ad |
|
10-Dec-2015 |
Maor Gottlieb <maorg@mellanox.com> |
net/mlx5: Use flow steering infrastructure for mlx5_en Expose the new flow steering API and remove the old one. Few changes are required: 1. The Ethernet flow steering follows the existing implementation, but uses the new steering API. The old flow steering implementation is removed. 2. Move the E-switch FDB management to use the new API. 3. When driver is loaded call to mlx5_init_fs which initialize the flow steering tree structure, open namespaces for NIC receive and for E-switch FDB. 4. Call to mlx5_cleanup_fs when the driver is unloaded. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
3b751a2a |
|
01-Dec-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch, Introduce get vf statistics Add support to get VF statistics using query vport counter command. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
9e7ea352 |
|
01-Dec-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch, Introduce set vport vlan (VST mode) Add query and modify functions to control client vlan and qos striping or insertion, in E-Switch vports contexts. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
77256579 |
|
01-Dec-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch, Introduce Vport administration functions Implement set VF mac/link state and query VF config to be used later in nedev VF ndos or any other management API. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
81848731 |
|
01-Dec-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: E-Switch, Add SR-IOV (FDB) support Enabling E-Switch SRIOV for nvfs+1 vports. Create E-Switch FDB for L2 UC/MC mac steering between VFs/PF and external vport (Uplink). FDB contains forwarding rules such as: UC MAC0 -> vport0(PF). UC MAC1 -> vport1. UC MAC2 -> vport2. MC MACX -> vport0, vport2, Uplink. MC MACY -> vport1, Uplink. For unmatched traffic FDB has the following default rules: Unmached Traffic (src vport != Uplink) -> Uplink. Unmached Traffic (src vport == Uplink) -> vport0(PF). FDB rules population: Each NIC vport (VF) will notify E-Switch manager of its UC/MC vport context changes via modify vport context command, which will be translated to an event that will be handled by E-Switch manager (PF) which will update FDB table accordingly. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
#
073bb189 |
|
01-Dec-2015 |
Saeed Mahameed <saeedm@mellanox.com> |
net/mlx5: Introducing E-Switch and l2 table E-Switch is the software entity that represents and manages ConnectX4 inter-HCA ethernet l2 switching. E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be connected to the device through a vport of an e-switch. Each e-switch is managed by one vNIC identified by HCA_CAP.vport_group_manager (usually it is the PF/vport[0]), and its main responsibility is to forward each packet to the right vport. e-Switch needs to manage its own l2-table and FDB tables. L2 table is a flow table that is managed by FW, it is needed for Multi-host (Multi PF) configuration for inter HCA switching between PFs. FDB table is a flow table that is totally managed by e-Switch driver, its main responsibility is to switch packets between e-Swtich internal vports and uplink vport that belong to the same. This patch introduces only e-Swtich l2 table management, FDB managemnt will come later when ethernet SRIOV/VFs will be enabled. preperation for ethernet sriov and l2 table management. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|