Cross Reference: /linux-master/drivers/net/ethernet/mellanox/mlx5/core/en/tc

History log of /linux-master/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
Revision	Date	Author	Comments
# f7a48511	03-Jun-2023	Yevgeny Kliteynik <kliteyn@nvidia.com>	net/mlx5e: TC, CT: Offload ct clear only once Non-clear CT action causes a flow rule split, while CT clear action doesn't and is just a header-rewrite to the current flow rule. But ct offload is done in post_parse and is per ct action instance, so ct clear offload is parsed multiple times, while its deleted once. Fix this by post_parsing the ct action only once per flow attribute (which is per flow rule) by using a offloaded ct_attr flag. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# b100573a	11-Apr-2023	Chris Mi <cmi@nvidia.com>	net/mlx5e: TC, Add null pointer check for hardware miss support The cited commits add hardware miss support to tc action. But if the rules can't be offloaded, the pointers are null and system will panic when accessing them. Fix it by checking null pointer. Fixes: 08fe94ec5f77 ("net/mlx5e: TC, Remove special handling of CT action") Fixes: 6702782845a5 ("net/mlx5e: TC, Set CT miss to the specific ct action instance") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 8ac04a28	24-Mar-2023	Vlad Buslov <vladbu@nvidia.com>	net/mlx5e: Release the label when replacing existing ct entry Cited commit doesn't release the label mapping when replacing existing ct entry which leads to following memleak report: unreferenced object 0xffff8881854cf280 (size 96): comm "kworker/u48:74", pid 23093, jiffies 4296664564 (age 175.944s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000002722d368>] __kmalloc+0x4b/0x1c0 [<00000000cc44e18f>] mapping_add+0x6e8/0xc90 [mlx5_core] [<000000003ad942a7>] mlx5_get_label_mapping+0x66/0xe0 [mlx5_core] [<00000000266308ac>] mlx5_tc_ct_entry_create_mod_hdr+0x1c4/0xf50 [mlx5_core] [<000000009a768b4f>] mlx5_tc_ct_entry_add_rule+0x16f/0xaf0 [mlx5_core] [<00000000a178f3e5>] mlx5_tc_ct_block_flow_offload_add+0x10cb/0x1f90 [mlx5_core] [<000000007b46c496>] mlx5_tc_ct_block_flow_offload+0x14a/0x630 [mlx5_core] [<00000000a9a18ac5>] nf_flow_offload_tuple+0x1a3/0x390 [nf_flow_table] [<00000000d0881951>] flow_offload_work_handler+0x257/0xd30 [nf_flow_table] [<000000009e4935a4>] process_one_work+0x7c2/0x13e0 [<00000000f5cd36a7>] worker_thread+0x59d/0xec0 [<00000000baed1daf>] kthread+0x28f/0x330 [<0000000063d282a4>] ret_from_fork+0x1f/0x30 Fix the issue by correctly releasing the label mapping. Fixes: 94ceffb48eac ("net/mlx5e: Implement CT entry update") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 08fe94ec	25-Jan-2023	Paul Blakey <paulb@nvidia.com>	net/mlx5e: TC, Remove special handling of CT action CT action has special treating as a per-flow action since it was assumed to be singular and reordered to be first on the action list. This isn't the case anymore, and can be converted to just a FWD to pre_ct + MODIFY_HEAD, and handled per post_act rule. Remove special handling of CT action, and offload it while post parsing each ct attribute. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 67027828	17-Feb-2023	Paul Blakey <paulb@nvidia.com>	net/mlx5e: TC, Set CT miss to the specific ct action instance Currently, CT misses restore the missed chain on the tc skb extension so tc will continue from the relevant chain. Instead, restore the CT action's miss cookie on the extension, which will instruct tc to continue from the this specific CT action instance on the relevant filter's action list. Map the CT action's miss_cookie to a new miss object (ACT_MISS), and use this miss mapping instead of the current chain miss object (CHAIN_MISS) for CT action misses. To restore this new miss mapping value, add a RX restore rule for each such mapping value. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Sholmo <ozsh@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
# 235ff07d	17-Feb-2023	Paul Blakey <paulb@nvidia.com>	net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG This reg usage is always a mapped object, not necessarily containing chain info. Rename to properly convey what it stores. This patch doesn't change any functionality. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
# 03a283cd	17-Feb-2023	Paul Blakey <paulb@nvidia.com>	net/mlx5: Kconfig: Make tc offload depend on tc skb extension Tc skb extension is a basic requirement for using tc offload to support correct restoration on action miss. Depend on it. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
# f869bcb0	06-Nov-2022	Vlad Buslov <vladbu@nvidia.com>	net/mlx5e: Allow offloading of ct 'new' match Allow offloading filters that match on conntrack 'new' state in order to enable UDP NEW offload in the following patch. Unhardcode ct 'established' from ct modify header infrastructure code and determine correct ct state bit according to the metadata action 'cookie' field. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 94ceffb4	01-Dec-2022	Vlad Buslov <vladbu@nvidia.com>	net/mlx5e: Implement CT entry update With support for UDP NEW offload the flow_table may now send updates for existing flows. Support properly replacing existing entries by updating flow restore_cookie and replacing the rule with new one with the same match but new mod_hdr action that sets updated ctinfo. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 29744a10	01-Feb-2023	Vlad Buslov <vladbu@nvidia.com>	net: flow_offload: provision conntrack info in ct_metadata In order to offload connections in other states besides "established" the driver offload callbacks need to have access to connection conntrack info. Flow offload intermediate representation data structure already contains that data encoded in 'cookie' field, so just reuse it in the drivers. Reject offloading IP_CT_NEW connections for now by returning an error in relevant driver callbacks based on value of ctinfo. Support for offloading such connections will need to be added to the drivers afterwards. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 849190e3	27-Nov-2022	Chris Mi <cmi@nvidia.com>	net/mlx5e: CT: Fix ct debugfs folder name Need to use sprintf to build a string instead of sscanf. Otherwise dirname is null and both "ct_nic" and "ct_fdb" won't be created. But its redundant anyway as driver could be in switchdev mode but still add nic rules. So use "ct" as folder name. Fixes: 77422a8f6f61 ("net/mlx5e: CT: Add ct driver counters") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 05bb74c2	31-Oct-2022	Oz Shlomo <ozsh@nvidia.com>	net/mlx5e: CT, optimize pre_ct table lookup The pre_ct table realizes in hardware the act_ct cache logic, bypassing the CT table if the ct state was already set by a previous ct lookup. As such, the pre_ct table will always miss for chain 0 filters. Optimize the pre_ct table lookup for rules installed on chain 0. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 22df2e93	12-Jul-2022	Roi Dayan <roid@nvidia.com>	net/mlx5: CT: Remove warning of ignore_flow_level support for non PF ignore_flow_level isn't supported for SFs, and so it causes post_act and ct to warn about it per SF. Apply the warning only for PF. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 17c5da03	31-Oct-2021	Jianbo Liu <jianbol@nvidia.com>	net/mlx5e: Add generic macros to use metadata register mapping There are many definitions to get bits and mask for different types of metadata register mapping, add generic macros to unify them. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 6c4e8fa0	21-Jun-2022	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT: Use own workqueue instead of mlx5e priv Allocate a ct priv workqueue instead of using mlx5e priv one so flushing will only be of related CT entries. Also move flushing of the workqueue before rhashtable destroy otherwise entries won't be valid. Fixes: b069e14fff46 ("net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 1f2856cd	23-May-2022	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Fix header-rewrite re-use for tupels Tuple entries that don't have nat configured for them which are added to the ct nat table will always create a new modify header, as we don't check for possible re-use on them. The same for tuples that have nat configured for them but are added to ct table. Fix the above by only avoiding wasteful re-use lookup for actually natted entries in ct nat table. Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 77422a8f6	17-May-2022	Saeed Mahameed <saeedm@nvidia.com>	net/mlx5e: CT: Add ct driver counters Connection offload is translated to multiple rules over several hardware flow tables. Unhandled end-cases may cause a hardware resource leak causing multiple system symptoms such as a host memory leak, decreased performance and other scale related issues. Export the current number of firmware FTEs related to the CT table as a debugfs counter. Also add a dropped packets counter to help debug packets dropped on restore failure. To show the offloaded count: cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/offloaded To show the dropped count: cat /sys/kernel/debug/mlx5/<PCI>/ct_nic/rx_dropped Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Roi Dayan <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com>
# 7134c602	22-Apr-2022	Haowen Bai <baihaowen@meizu.com>	net/mlx5: Remove useless kfree After alloc fail, we do not need to kfree. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# b069e14f	29-Mar-2022	Paul Blakey <paulb@nvidia.com>	net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release __mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT, if that is the last reference to that tuple, the actual deletion of the tuple can happen after the FT is already destroyed and freed. Flush the used workqueue before destroying the ct FT. Fixes: a2173131526d ("net/mlx5e: CT: manage the lifetime of the ct entry object") Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 087032ee	23-Feb-2022	Ariel Levkovich <lariel@nvidia.com>	net/mlx5e: TC, Fix ct_clear overwriting ct action metadata ct_clear action is translated to clearing reg_c metadata which holds ct state and zone information using mod header actions. These actions are allocated during the actions parsing, as part of the flow attributes main mod header action list. If ct action exists in the rule, the flow's main mod header is used only in the post action table rule, after the ct tables which set the ct info in the reg_c as part of the ct actions. Therefore, if the original rule has a ct_clear action followed by a ct action, the ct action reg_c setting will be done first and will be followed by the ct_clear resetting reg_c and overwriting the ct info. Fix this by moving the ct_clear mod header actions allocation from the ct action parsing stage to the ct action post parsing stage where it is already known if ct_clear is followed by a ct action. In such case, we skip the mod header actions allocation for the ct clear since the ct action will write to reg_c anyway after clearing it. Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# ebf04231	23-Feb-2022	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Remove extra rhashtable remove on tuple entries On tuple offload del command, tuples are tried to be removed twice from the hashtable, once directly via mlx5_tc_ct_entry_remove_from_tuples() and a second time in the following mlx5_tc_ct_entry_put()-> mlx5_tc_ct_entry_del()->mlx5_tc_ct_entry_remove_from_tuples() call. This doesn't cause any issue since rhashtable first checks if the removed object exists in the hashtable. Remove the extra mlx5_tc_ct_entry_remove_from_tuples(). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 3ee61ebb	29-Sep-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Add software steering ct flow steering provider fs_core layer adds extra book keeping that is either unneeded for CT, or unused by the underlying software steering, such as allocating FTEs and FTE ids, saving the match key and mask, and autogroups management. On top of that, direct steering has a translation layer (fs_dr) from PRM commands to direct steering objects, for example, creating temporary dr_action objects. This has a performance impact when dealing with CT high insertion rate. To use direct steering (smfs) directly for ct, add a tc ct fs smfs implementation. Instead of dmfs autogroups, smfs ct fs uses one of 4 predefined dr matchers in CT and CT-NAT tables, for each combination of tuple ethertype (ipv4/ipv6), and tuple ip_proto (udp/tcp) that is currently used by nf flow table flow offload. At rule insertions, validate the flow rule fits one of the predfined matcher, and insert to it. To fill the dr_actions of the rule efficiently, create the fwd to post_ct tbl dr_action at fs init, the count dr_action at counter creation, and re-use the already pre-allocated modify header dr_action. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 76909000	23-Nov-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Introduce a platform for multiple flow steering providers Currently, fs_core layer provides flow steering services to the driver including: autogroups, allocating FTEs (flow table entries) and FTE ids, and support of fte action modification. If then software steering is configured, rule insertion will go through a translation layer from firmware buffers to software steering objects (see fs_dr.c). The connection tracking table is a system table that is not directly controlled by the user and is a very high scale table. These fs_core services introduces an overhead that may be optimized by using software steering API directly. Introduce ct flow steering interface to allow multiple flow steering providers. Use the new interface to implement the current dmfs (device managed flow steering) provider which uses fs_core insertion. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 1918ace1	24-Feb-2022	Toshiaki Makita <toshiaki.makita1@gmail.com>	net/mlx5: Support GRE conntrack offload Support GREv0 without NAT. Signed-off-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
# a8128326	01-Dec-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: Use multi table support for CT and sample actions CT and sample actions use post actions for their implementation. Flag those actions as multi table actions so the post act infrastructure will handle the post actions allocation. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 8300f225	08-Aug-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: Create new flow attr for multi table actions Some TC actions use post actions for their implementation. For example CT and sample actions. Create a new flow attr after each multi table action and create a post action rule for it. First flow attr being offloaded normally and linked to the next attr (post action rule) with setting an id on reg_c. Post action rules match the id on reg_c and continue to the next one. The flow counter is allocated on the last rule. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# a572c0a7	19-Dec-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT, Remove redundant flow args from tc ct calls The flow arg is not being used so remove it. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 73a3f1bc	19-Dec-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: TC, Store mapped tunnel id on flow attr In preparation for multiple attr instances the tunnel_id should be attr specific and not flow specific. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# e5d4e1da	19-Dec-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: Refactor eswitch attr flags to just attr flags The flags are flow attrs and not esw specific attr flags. Refactor to remove the esw prefix and move from eswitch.h to en_tc.h where struct mlx5_flow_attr exists. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# efe6f961	15-Dec-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT, Don't set flow flag CT for ct clear flow ct clear action is a normal flow with a modify header for registers to 0. there is no need for any special handling in tc_ct.c. Parsing of ct clear action still allocates mod acts to set 0 on the registers and the driver continue to add a normal rule with modify hdr context. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# c9c079b4	03-Jan-2022	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Set flow source hint from provided tuple device Get originating device from tuple offload metadata match ingress_ifindex, and set flow_source hint to either LOCAL for vf/sf reps, UPLINK for uplink/wire/tunnel devices/bond, or ANY (as before this patch) for all others. This allows lower layer (software steering or firmware) to insert the tuple rule only in one table (either rx or tx) instead of two (rx and tx). Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 0164a9bd	03-Nov-2021	Yihao Han <hanyihao@vivo.com>	net/mlx5: TC, using swap() instead of tmp variable swap() was used instead of the tmp variable to swap values Signed-off-by: Yihao Han <hanyihao@vivo.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 1cfd3490	25-Aug-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Allow static allocation of mod headers As each CT rule uses at least 4 modify header actions, each rule causes at least 3 reallocations by the mod header actions api. Allow initial static allocation of the mod acts array, and use it for CT rules. If the static allocation is exceeded go back to dynamic allocation. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com>
# 2c0e5cf5	05-Jul-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5e: Refactor mod header management API For all mod hdr related functions to reside in a single self contained component (mod_hdr.c), refactor alloc() and add get_id() so that user won't rely on internal implementation, and move both to mod_hdr component. Rename the prefix to mlx5e_mod_hdr_* as other mod hdr functions. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 806401c2	08-Nov-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT, Fix multiple allocations and memleak of mod acts CT clear action offload adds additional mod hdr actions to the flow's original mod actions in order to clear the registers which hold ct_state. When such flow also includes encap action, a neigh update event can cause the driver to unoffload the flow and then reoffload it. Each time this happens, the ct clear handling adds that same set of mod hdr actions to reset ct_state until the max of mod hdr actions is reached. Also the driver never releases the allocated mod hdr actions and causing a memleak. Fix above two issues by moving CT clear mod acts allocation into the parsing actions phase and only use it when offloading the rule. The release of mod acts will be done in the normal flow_put(). backtrace: [<000000007316e2f3>] krealloc+0x83/0xd0 [<00000000ef157de1>] mlx5e_mod_hdr_alloc+0x147/0x300 [mlx5_core] [<00000000970ce4ae>] mlx5e_tc_match_to_reg_set_and_get_id+0xd7/0x240 [mlx5_core] [<0000000067c5fa17>] mlx5e_tc_match_to_reg_set+0xa/0x20 [mlx5_core] [<00000000d032eb98>] mlx5_tc_ct_entry_set_registers.isra.0+0x36/0xc0 [mlx5_core] [<00000000fd23b869>] mlx5_tc_ct_flow_offload+0x272/0x1f10 [mlx5_core] [<000000004fc24acc>] mlx5e_tc_offload_fdb_rules.part.0+0x150/0x620 [mlx5_core] [<00000000dc741c17>] mlx5e_tc_encap_flows_add+0x489/0x690 [mlx5_core] [<00000000e92e49d7>] mlx5e_rep_update_flows+0x6e4/0x9b0 [mlx5_core] [<00000000f60f5602>] mlx5e_rep_neigh_update+0x39a/0x5d0 [mlx5_core] Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 504e1572	11-Jul-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: Allow skipping counter refresh on creation CT creates a counter for each CT rule, and for each such counter, fs_counters tries to queue mlx5_fc_stats_work() work again via mod_delayed_work(0) call to refresh all counters. This call has a large performance impact when reaching high insertion rate and accounts for ~8% of the insertion time when using software steering. Allow skipping the refresh of all counters during counter creation. Change CT to use this refresh skipping for it's counters. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# ae2ee3be	31-Aug-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Remove warning of ignore_flow_level support for VFs ignore_flow_level isn't supported for VFs, and so it causes post_act and ct to warn about it. Instead of disabling CT for VFs, and a driver update will be need to enable CT again once firmware support this, remove this warning specifically for VFs. This way, it could be automatically enabled on future firmwares where VFs support ignore_flow_level capability. Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 88594d83	30-Sep-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Fix missing cleanup of ct nat table on init failure If CT fails to initialize it's rhashtables, it doesn't destroy the ct nat global table. Destroy the ct nat global table on ct init failure. Fixes: d7cade513752 ("net/mlx5e: check return value of rhashtable_init") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# d7cade51	26-Sep-2021	MichelleJin <shjy180909@gmail.com>	net/mlx5e: check return value of rhashtable_init When rhashtable_init() fails, it returns -EINVAL. However, since error return value of rhashtable_init is not checked, it can cause use of uninitialized pointers. So, fix unhandled errors of rhashtable_init. Signed-off-by: MichelleJin <shjy180909@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# f0da4daa	16-Aug-2021	Chris Mi <cmi@nvidia.com>	net/mlx5e: Refactor ct to use post action infrastructure Move post action table management to common library providing add/del/get API. Refactor the ct action offload to use the common API. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 27997978	01-Jun-2021	Chris Mi <cmi@nvidia.com>	net/mlx5e: CT, Use xarray to manage fte ids IDR is deprecated. Use xarray instead. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 2198b932	03-Aug-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: Use shared mappings for restoring from metadata FTEs are added with mapped metadata which is saved per eswitch. When uplink reps are bonded and we are in a single FDB mode, we could fail to find metadata which was stored on one eswitch mapping but not the other or with a different id. To resolve this issue use shared mapping between eswitch ports. We do not have any conflict using a single mapping, for a type, between the ports. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# ed2fe7ba	10-Mar-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5e: TC: Use bit counts for register mapping To prepare for next patch where we will use a non-byte aligned mapping, change all byte counts in register mapping to bits. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 7fac5c2e	19-Apr-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Avoid reusing modify header context for natted entries Currently the driver is designed to reuse header modify context entries. Natted entries will always have a unique modify header, as such the modify header hashtable lookup is introducing an overhead. When the hashtable size exceeded 200k entries the tested insertion rate dropped from ~10k entries/sec to ~300 entries/sec. Don't use the re-use mechanism when creating modify headers for natted tuples. Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 74097a0d	11-Mar-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT, Remove newline from ct_dbg call ct_dbg() already adds a newline. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 116c76c5	15-Mar-2021	Ariel Levkovich <lariel@nvidia.com>	net/mlx5: CT: Add support for matching on ct_state inv and rel flags Add support for matching on ct_state inv and rel flags. Currently the support is only for match on -inv and -rel. Matching on +inv and +rel will be rejected. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state -est-rel+trk \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-inv \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 69e2916e	21-Sep-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5: CT: Add support for mirroring Add support for mirroring before the CT action by spliting the pre ct rule. Mirror outputs are done first on the tc chain,prio table rule (the fwd rule), which will then forward to a per port fwd table. On this fwd table, we insert the original pre ct rule that forwards to ct/ct nat table. Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 9f4d9283	09-Mar-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: Alloc flow spec using kvzalloc instead of kzalloc flow spec is not small and we do allocate it using kvzalloc in most places of the driver. fix rest of the places to use kvzalloc to avoid failure in allocation when memory is too fragmented. Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 76e68d95	19-Nov-2020	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT, Avoid false lock dependency warning To avoid false lock dependency warning set the ct_entries_ht lock class different than the lock class of the ht being used when deleting last flow from a group and then deleting a group, we get into del_sw_flow_group() which call rhashtable_destroy on fg->ftes_hash which will take ht->mutex but it's different than the ht->mutex here. ====================================================== WARNING: possible circular locking dependency detected 5.10.0-rc2+ #8 Tainted: G O ------------------------------------------------------ revalidator23/24009 is trying to acquire lock: ffff888128d83828 (&node->lock){++++}-{3:3}, at: mlx5_del_flow_rules+0x83/0x7a0 [mlx5_core] but task is already holding lock: ffff8881081ef518 (&ht->mutex){+.+.}-{3:3}, at: rhashtable_free_and_destroy+0x37/0x720 which lock already depends on the new lock. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# d24f847e	08-Mar-2021	Ariel Levkovich <lariel@nvidia.com>	net/mlx5e: Fix mapping of ct_label zero ct_label 0 is a default label each flow has and therefore there can be rules that match on ct_label=0 without a prior rule that set the ct_label to this value. The ct_label value is not used directly in the HW rules and instead it is mapped to some id within a defined range and this id is used to set and match the metadata register which carries the ct_label. If we have a rule that matches on ct_label=0, the hw rule will perform matching on a value that is != 0 because of the mapping from label to id. Since the metadata register default value is 0 and it was never set before to anything else by an action that sets the ct_label, there will always be a mismatch between that register and the value in the rule. To support such rule, a forced mapping of ct_label 0 to id=0 is done so that it will match the metadata register default value of 0. Fixes: 54b154ecfb8c ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 96b5b458	04-Mar-2021	Dima Chumak <dchumak@nvidia.com>	net/mlx5e: Offload tuple rewrite for non-CT flows Setting connection tracking OVS flows and then setting non-CT flows that use tuple rewrite action (e.g. mod_tp_dst), causes the latter flows not being offloaded. Fix by using a stricter condition in modify_header_match_supported() to check tuple rewrite support only for flows with CT action. The check is factored out into standalone modify_tuple_supported() function to aid readability. Fixes: 7e36feeb0467 ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# a2173131	11-Jan-2021	Oz Shlomo <ozsh@nvidia.com>	net/mlx5e: CT: manage the lifetime of the ct entry object The ct entry object is accessed by the ct add, del, stats and restore methods. In addition, it is referenced from several hash tables. The lifetime of the ct entry object was not managed which triggered race conditions as in the following kasan dump: [ 3374.973945] ================================================================== [ 3374.988552] BUG: KASAN: use-after-free in memcmp+0x4c/0x98 [ 3374.999590] Read of size 1 at addr ffff00036129ea55 by task ksoftirqd/1/15 [ 3375.016415] CPU: 1 PID: 15 Comm: ksoftirqd/1 Tainted: G O 5.4.31+ #1 [ 3375.055301] Call trace: [ 3375.060214] dump_backtrace+0x0/0x238 [ 3375.067580] show_stack+0x24/0x30 [ 3375.074244] dump_stack+0xe0/0x118 [ 3375.081085] print_address_description.isra.9+0x74/0x3d0 [ 3375.091771] __kasan_report+0x198/0x1e8 [ 3375.099486] kasan_report+0xc/0x18 [ 3375.106324] __asan_load1+0x60/0x68 [ 3375.113338] memcmp+0x4c/0x98 [ 3375.119409] mlx5e_tc_ct_restore_flow+0x3a4/0x6f8 [mlx5_core] [ 3375.131073] mlx5e_rep_tc_update_skb+0x1d4/0x2f0 [mlx5_core] [ 3375.142553] mlx5e_handle_rx_cqe_rep+0x198/0x308 [mlx5_core] [ 3375.154034] mlx5e_poll_rx_cq+0x2a0/0x1060 [mlx5_core] [ 3375.164459] mlx5e_napi_poll+0x1d4/0xa78 [mlx5_core] [ 3375.174453] net_rx_action+0x28c/0x7a8 [ 3375.182004] __do_softirq+0x1b4/0x5d0 Manage the lifetime of the ct entry object by using synchornization mechanisms for concurrent access. Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# c7b9038d	25-Jan-2021	Vlad Buslov <vladbu@nvidia.com>	net/mlx5e: TC preparation refactoring for routing update event Following patch in series implement routing update event which requires ability to modify rule match_to_reg modify header actions dynamically during rule lifetime. In order to accommodate such behavior, refactor and extend TC infrastructure in following ways: - Modify mod_hdr infrastructure to preserve its parse attribute for whole rule lifetime, instead of deallocating it after rule creation. - Extend match_to_reg infrastructure with new function mlx5e_tc_match_to_reg_set_and_get_id() that returns mod_hdr action id that can be used afterwards to update the action, and mlx5e_tc_match_to_reg_mod_hdr_change() that can modify existing actions by its id. - Extend tun API with new functions mlx5e_tc_tun_update_header_ipv{4\|6}() that are used to updated existing encap entry tunnel header. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 275c21d6	23-Sep-2020	Vlad Buslov <vladbu@nvidia.com>	net/mlx5e: Always set attr mdev pointer Eswitch offloads extensions in following patches in the series require attr->esw_attr->in_mdev pointer to always be set. This is already the case for all code paths except mlx5_tc_ct_entry_add_rule() function. Fix the function to assign mdev pointer with priv->mdev value. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 902c0245	07-Jan-2021	Saeed Mahameed <saeedm@nvidia.com>	net/mlx5e: CT: remove useless conversion to PTR_ERR then ERR_PTR Just return the ptr directly. Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 6895cb3a	27-Jan-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Add support for matching on ct_state reply flag Add support for matching on ct_state reply flag. Example: $ tc filter add dev ens1f0_0 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est+rpl \ action mirred egress redirect dev ens1f0_1 $ tc filter add dev ens1f0_1 ingress prio 1 chain 1 proto ip flower \ ct_state +trk+est-rpl \ action mirred egress redirect dev ens1f0_0 Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
# 763e1e54	12-Jan-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT: Remove redundant usage of zone mask The zone member is of type u16 so there is no reason to apply the zone mask on it. This is also matching the call to set a match in other places which don't need and don't apply the mask. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
# f822cf86	12-Jan-2021	Roi Dayan <roid@nvidia.com>	net/mlx5e: CT: Pass null instead of zero spec No need to pass zero spec to mlx5_add_flow_rules() as the function can handle null spec. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
# e2194a17	25-Jan-2021	Paul Blakey <paulb@nvidia.com>	net/mlx5: CT: Fix incorrect removal of tuple_nat_node from nat rhashtable If a non nat tuple entry is inserted just to the regular tuples rhashtable (ct_tuples_ht) and not to natted tuples rhashtable (ct_nat_tuples_ht). Commit bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") mixed up the return labels and names sot that on cleanup or failure we still try to remove for the natted tuples rhashtable. Fix that by correctly checking if a natted tuples insertion before removing it. While here make it more readable. Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# eed38eee	07-Dec-2020	Oz Shlomo <ozsh@nvidia.com>	net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled Connection counters may be shared for both directions when the counter is used for connection aging purposes. However, if TC flow accounting is enabled then a unique counter is required per direction. Instantiate a unique counter per direction if the conntrack accounting extension is enabled. Use a shared counter when the connection accounting extension is disabled. Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 2b021989	31-Aug-2020	Maor Dickman <maord@nvidia.com>	net/mlx5e: CT, Fix coverity issue The cited commit introduced the following coverity issue at function mlx5_tc_ct_rule_to_tuple_nat: - Memory - corruptions (OVERRUN) Overrunning array "tuple->ip.src_v6.in6_u.u6_addr32" of 4 4-byte elements at element index 7 (byte offset 31) using index "ip6_offset" (which evaluates to 7). In case of IPv6 destination address rewrite, ip6_offset values are between 4 to 7, which will cause memory overrun of array "tuple->ip.src_v6.in6_u.u6_addr32" to array "tuple->ip.dst_v6.in6_u.u6_addr32". Fixed by writing the value directly to array "tuple->ip.dst_v6.in6_u.u6_addr32" in case ip6_offset values are between 4 to 7. Fixes: bc562be9674b ("net/mlx5e: CT: Save ct entries tuples in hashtables") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 7b2b16ee	27-Sep-2020	Dan Carpenter <dan.carpenter@oracle.com>	net/mlx5e: Fix a use after free on error in mlx5_tc_ct_shared_counter_get() This code frees "shared_counter" and then dereferences on the next line to get the error code. Fixes: 1edae2335adf ("net/mlx5e: CT: Use the same counter for both directions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 670c239a	30-Aug-2020	Ariel Levkovich <lariel@nvidia.com>	net/mlx5e: Keep direct reference to mlx5_core_dev in tc ct Keep and use a direct reference to the mlx5 core device in all of tc_ct code instead of accessing it via a pointer to mlx5 eswitch in order to support nic mode ct offload for VF devices that don't have a valid eswitch pointer set. Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 89fbdbae	22-Sep-2020	Saeed Mahameed <saeedm@nvidia.com>	net/mlx5e: TC: Remove unused parameter from mlx5_tc_ct_add_no_trk_match() priv is never used in this function Fixes: 7e36feeb0467 ("net/mlx5e: CT: Don't offload tuple rewrites for established tuples") Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# 1edae233	19-Jul-2020	Oz Shlomo <ozsh@mellanox.com>	net/mlx5e: CT: Use the same counter for both directions A connection is represented by two 5-tuple entries, one for each direction. Currently, each direction allocates its own hw counter, which is inefficient as ct aging is managed per connection. Share the counter that was allocated for the original direction with the reverse direction. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# aedd133d	21-Jul-2020	Ariel Levkovich <lariel@mellanox.com>	net/mlx5e: Support CT offload for tc nic flows Adding support to perform CT related tc actions and matching on CT states for nic flows. The ct flows management and handling will be done using a new instance of the ct database that is declared in this patch to keep it separate from the eswitch ct flows database. Offloading and unoffloading ct flows will be done using the existing ct offload api by providing it the relevant ct database reference in each mode. In addition, refactoring the tc ct api is introduced to make it agnostic to the flow type and perform the resource allocations and rule insertion to the proper steering domain in the device. In the initialization call, the api requests and stores in the ct database instance all the relevant information that distinguishes between nic flows and esw flows, such as chains database, steering namespace and mod hdr table. This way the operations of adding and removing ct flows to the device can later performed agnostically to the flow type. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 211a5364	07-May-2020	Ariel Levkovich <lariel@mellanox.com>	net/mlx5e: rework ct offload init messages The changes are: - Use mlx5_core print macros instead of netdev_warn since netdev is not always initialized at that stage. - Print a warning message in case the issue is with lack of support for CT offload without indicating an error. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# c620b772	29-Apr-2020	Ariel Levkovich <lariel@mellanox.com>	net/mlx5: Refactor tc flow attributes structure In order to support chains and connection tracking offload for nic flows, there's a need to introduce a common flow attributes struct so that these features can be agnostic and have access to a single attributes struct, regardless of the flow type. Therefore, a new tc flow attributes format is introduced to allow access to attributes that are common to eswitch and nic flows. The common attributes will always get allocated for the new flows, regardless of their type, while the type specific attributes are separated into different structs and will be allocated based on the flow type to avoid memory waste. When allocating the flow attributes the caller provides the flow steering namespace and according the namespace type the additional space for the extra, type specific, attributes is determined and added to the total attribute allocation size. In addition, the attributes that are going to be common to both flow types are moved to the common attributes struct. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
# ae430332	24-Apr-2020	Ariel Levkovich <lariel@mellanox.com>	net/mlx5: Refactor multi chains and prios support Decouple the chains infrastructure from eswitch and make it generic to support other steering namespaces. The change defines an agnostic data structure to keep all the relevant information for maintaining flow table chaining in any steering namespace. Each namespace that requires table chaining will be required to allocate such data structure. The chains creation code will receive the steering namespace and flow table parameters from the caller so it will operate agnosticly when creating the required resources to maintain the table chaining function while Parts of the code that are relevant to eswitch specific functionality are moved to eswitch files. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 4c8594ad	26-Jul-2020	Roi Dayan <roid@mellanox.com>	net/mlx5e: CT: Fix freeing ct_label mapping Add missing mapping remove call when removing ct rule, as the mapping was allocated when ct rule was adding with ct_label. Also there is a missing mapping remove call in error flow. Fixes: 54b154ecfb8c ("net/mlx5e: CT: Map 128 bits labels to 32 bit map ID") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 54b154ec	18-Jun-2020	Eli Britstein <elibr@mellanox.com>	net/mlx5e: CT: Map 128 bits labels to 32 bit map ID The 128 bits ct_label field is matched using a 32 bit hardware register. As such, only the lower 32 bits of ct_label field are offloaded. Change this logic to support setting and matching higher bits too. Map the 128 bits data to a unique 32 bits ID. Matching is done as exact match of the mapping ID of key & mask. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Dickman <maord@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# bbe11249	30-Jun-2020	Roi Dayan <roid@mellanox.com>	net/mlx5e: CT: Fix releasing ft entries Before this commit, on ft flush, ft entries were not removed from the ct_tuple hashtables. Fix it. Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# de96d573	04-May-2020	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5e: CT: Remove unused function param "flow" parameter is not used in __mlx5_tc_ct_flow_offload_clear(), remove it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
# 2acc4551	04-May-2020	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5e: CT: Return err_ptr from internal functions Instead of having to deal with converting between int and ERR_PTR for return values in mlx5_tc_ct_flow_offload(), make the internal helper functions return a ptr to mlx5_flow_handle instead of passing it as output param, this will also avoid gcc confusion and false alarms, thus we remove the redundant ERR_PTR rule initialization. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
# 8f5b3c3e	05-May-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Use mapping for zone restore register Use a single byte mapping for zone restore register (zone matching remains 16 bit). This makes room for using the freed 8 bits on register C1 for mapping more tunnels and tunnel options. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 6702d393	18-Feb-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Re-use tuple modify headers for identical modify actions After removing the tupleid register which changed per tuple, tuple modify headers set the ct_state, zone, mark, and label registers. For non-natted tuples going through the same tc rules path, their values will be the same, and all their modify headers will be the same. Re-use tuple modify header when possible, by adding each new modify header to an hahstable, and looking up identical ones before creating a new one. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# a8eb919b	29-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Restore ct state from lookup in zone instead of tupleid Remove tupleid, and replace it with zone_restore, which is the zone an established tuple sets after match. On miss, Use this zone + tuple taken from the skb, to lookup the ct entry and restore it. This improves flow insertion rate by avoiding the allocation of a header rewrite context. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 7e36feeb	22-Apr-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Don't offload tuple rewrites for established tuples Next patches will remove the tupleid registers that is used to restore the ct state on miss, and instead use the tuple on the missed packet to lookup which state to restore. Disable tuple rewrites after connection tracking. For tuple rewrites, inject a ct_state=-trk match so it won't change the tuple for established flows (+trk) that passed connection tracking, and instead miss to software. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# bc562be9	29-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Save ct entries tuples in hashtables Save original tuple and natted tuple in two new hashtables. This is a pre-step for restoring ct state after hw miss by performing a 5-tuple lookup on the hash tables. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# eb32b3f5	28-Jun-2020	Eli Britstein <elibr@mellanox.com>	net/mlx5e: CT: Fix memory leak in cleanup CT entries are deleted via a workqueue from netfilter. If removing the module before that, the rules are cleaned by the driver itself, but the memory entries for them are not freed. Fix that. Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 4b61d3e8	19-Jun-2020	Po Liu <Po.Liu@nxp.com>	net: qos offload add flow status with dropped count This patch adds a drop frames counter to tc flower offloading. Reporting h/w dropped frames is necessary for some actions. Some actions like police action and the coming introduced stream gate action would produce dropped frames which is necessary for user. Status update shows how many filtered packets increasing and how many dropped in those packets. v2: Changes - Update commit comments suggest by Jiri Pirko. Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 0d156f2d	07-Jun-2020	Oz Shlomo <ozsh@mellanox.com>	net/mlx5e: CT: Fix ipv6 nat header rewrite actions Set the ipv6 word fields according to the hardware definitions. Fixes: ac991b48d43c ("net/mlx5e: CT: Offload established flows") Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# fca53304	18-May-2020	Eli Britstein <elibr@mellanox.com>	net/mlx5e: Optimize performance for IPv4/IPv6 ethertype The HW is optimized for IPv4/IPv6. For such cases, pending capability, avoid matching on ethertype, and use ip_version field instead. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 4a5d5d73	11-May-2020	Eli Britstein <elibr@mellanox.com>	net/mlx5e: Helper function to set ethertype Set ethertype match in a helper function as a pre-step towards optimizing it. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# d37bd5e8	18-May-2020	Roi Dayan <roid@mellanox.com>	net/mlx5e: CT: Correctly get flow rule The correct way is to us the flow_cls_offload_flow_rule() wrapper instead of f->rule directly. Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 9102d836	12-Apr-2020	Roi Dayan <roid@mellanox.com>	net/mlx5e: CT: Fix offload with CT action after CT NAT action It could be a chain of rules will do action CT again after CT NAT Before this fix matching will break as we get into the CT table after NAT changes and not CT NAT. Fix this by adding pre ct and pre ct nat tables to skip ct/ct_nat tables and go straight to post_ct table if ct/nat was already done. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# d2658b4a	14-Apr-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5: CT: Remove unused variables Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 70a5698a	26-Apr-2020	Roi Dayan <roid@mellanox.com>	net/mlx5e: CT: Avoid false warning about rule may be used uninitialized Avoid gcc warning by preset rule to invalid ptr. Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# e59b254c	24-Apr-2020	Zheng Bin <zhengbin13@huawei.com>	net/mlx5e: Remove unneeded semicolon Fixes coccicheck warning: drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:690:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# d65dbedf	24-Apr-2020	Huy Nguyen <huyn@mellanox.com>	net/mlx5: Add support for COPY steering action Add COPY type to modify_header action. IPsec feature is the first feature that needs COPY steering action. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Leon Romanovsky <leonro@mellanox.com>
# 70840b66	06-Apr-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5: CT: Change idr to xarray to protect parallel tuple id allocation After allowing parallel tuple insertion, we get the following trace: [ 5505.142249] ------------[ cut here ]------------ [ 5505.148155] WARNING: CPU: 21 PID: 13313 at lib/radix-tree.c:581 delete_node+0x16c/0x180 [ 5505.295553] CPU: 21 PID: 13313 Comm: kworker/u50:22 Tainted: G OE 5.6.0+ #78 [ 5505.304824] Hardware name: Supermicro Super Server/X10DRT-P, BIOS 2.0b 03/30/2017 [ 5505.313740] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_flow_table] [ 5505.323257] RIP: 0010:delete_node+0x16c/0x180 [ 5505.349862] RSP: 0018:ffffb19184eb7b30 EFLAGS: 00010282 [ 5505.356785] RAX: 0000000000000000 RBX: ffff904ac95b86d8 RCX: ffff904b6f938838 [ 5505.365190] RDX: 0000000000000000 RSI: ffff904ac954b908 RDI: ffff904ac954b920 [ 5505.373628] RBP: ffff904b4ac13060 R08: 0000000000000001 R09: 0000000000000000 [ 5505.382155] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000 [ 5505.390527] R13: ffffb19184eb7bfc R14: ffff904b6bef5800 R15: ffff90482c1203c0 [ 5505.399246] FS: 0000000000000000(0000) GS:ffff904c2fc80000(0000) knlGS:0000000000000000 [ 5505.408621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5505.415739] CR2: 00007f5d27006010 CR3: 0000000058c10006 CR4: 00000000001626e0 [ 5505.424547] Call Trace: [ 5505.428429] idr_alloc_u32+0x7b/0xc0 [ 5505.433803] mlx5_tc_ct_entry_add_rule+0xbf/0x950 [mlx5_core] [ 5505.441354] ? mlx5_fc_create+0x23c/0x370 [mlx5_core] [ 5505.448225] mlx5_tc_ct_block_flow_offload+0x874/0x10b0 [mlx5_core] [ 5505.456278] ? mlx5_tc_ct_block_flow_offload+0x63d/0x10b0 [mlx5_core] [ 5505.464532] nf_flow_offload_tuple.isra.21+0xc5/0x140 [nf_flow_table] [ 5505.472286] ? __kmalloc+0x217/0x2f0 [ 5505.477093] ? flow_rule_alloc+0x1c/0x30 [ 5505.482117] flow_offload_work_handler+0x1d0/0x290 [nf_flow_table] [ 5505.489674] ? process_one_work+0x17c/0x580 [ 5505.494922] process_one_work+0x202/0x580 [ 5505.500082] ? process_one_work+0x17c/0x580 [ 5505.505696] worker_thread+0x4c/0x3f0 [ 5505.510458] kthread+0x103/0x140 [ 5505.514989] ? process_one_work+0x580/0x580 [ 5505.520616] ? kthread_bind+0x10/0x10 [ 5505.525837] ret_from_fork+0x3a/0x50 [ 5505.570841] ---[ end trace 07995de9c56d6831 ]--- This happens from parallel deletes/adds to idr, as idr isn't protected. Fix that by using xarray as the tuple_ids allocator instead of idr. Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command") Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 9808dd0a	26-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Use rhashtable's ct entries instead of a separate list Fixes CT entries list corruption. After allowing parallel insertion/removals in upper nf flow table layer, unprotected ct entries list can be corrupted by parallel add/del on the same flow table. CT entries list is only used while freeing a ct zone flow table to go over all the ct entries offloaded on that zone/table, and flush the table. As rhashtable already provides an api to go over all the inserted entries, fix the race by using the rhashtable iteration instead, and remove the list. Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command") Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 93a129eb	28-Mar-2020	Jiri Pirko <jiri@mellanox.com>	net: sched: expose HW stats types per action used by drivers It may be up to the driver (in case ANY HW stats is passed) to select which type of HW stats he is going to use. Add an infrastructure to expose this information to user. $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop $ tc -s filter show dev enp3s0np1 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.1 in_hw in_hw_count 2 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 10 sec used 10 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 49964352	13-Mar-2020	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5: E-Switch: Move eswitch chains to a new directory eswitch_offloads_chains.{c,h} were just introduced this kernel release cycle, eswitch is in high development demand right now and many features are planned to be added to it. eswitch deserves its own directory and here we move these new files to there, in preparation for upcoming eswitch features and new files. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com>
# aded104d	16-Mar-2020	Saeed Mahameed <saeedm@mellanox.com>	net/mlx5e: CT: Fix stack usage compiler warning Fix the following warnings: [-Werror=frame-larger-than=] In function ‘mlx5_tc_ct_entry_add_rule’: drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:541:1: error: the frame size of 1136 bytes is larger than 1024 bytes In function ‘__mlx5_tc_ct_flow_offload’: drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:1049:1: error: the frame size of 1168 bytes is larger than 1024 bytes Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com>
# 35e725e1	14-Mar-2020	YueHaibing <yuehaibing@huawei.com>	net/mlx5e: CT: remove set but not used variable 'unnew' drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c: In function mlx5_tc_ct_parse_match: drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:699:36: warning: variable unnew set but not used [-Wunused-but-set-variable] Fixes: 4c3844d9e97e ("net/mlx5e: CT: Introduce connection tracking") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
# 1ef3018f	11-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Support clear action Clear action, as with software, removes all ct metadata from the packet. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 5c6b9460	11-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Handle misses after executing CT action Mark packets with a unique tupleid, and on miss use that id to get the act ct restore_cookie. Using that restore cookie, we ask CT to restore the relevant info on the SKB. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# ac991b48	11-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Offload established flows Register driver callbacks with the nf flow table platform. FT add/delete events will create/delete FTE in the CT/CT_NAT tables. Restoring the CT state on miss will be added in the following patch. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
# 4c3844d9	11-Mar-2020	Paul Blakey <paulb@mellanox.com>	net/mlx5e: CT: Introduce connection tracking Add support for offloading tc ct action and ct matches. We translate the tc filter with CT action the following HW model: +-------------------+ +--------------------+ +--------------+ + pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct +-----> + original match + \| + tuple + zone match + \| + fte_id match + \| +-------------------+ \| +--------------------+ \| +--------------+ \| v v v set chain miss mapping set mark original set fte_id set label filter set zone set established actions set tunnel_id do nat (if needed) do decap Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>