History log of /linux-master/drivers/net/ethernet/intel/ice/ice_repr.c
Revision Date Author Comments
# fff292b4 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: add VF representors one by one

Implement adding representors one by one. Always set switchdev
environment when first representor is being added and clear environment
when last one is being removed.

Basic switchdev configuration remains the same. Code related to creating
and configuring representor was changed.

Instead of setting whole representors in one function handle only one
representor in setup function. The same with removing representors.

Stop representors when new one is being added or removed. Stop means,
disabling napi, stopping traffic and removing slow path rule. It is
needed because ::q_id will change after remapping, so each representor
will need new rule.

When representor are stopped rebuild control plane VSI with one more or
one less queue. One more if new representor is being added, one less if
representor is being removed.

Bridge port is removed during unregister_netdev() call on PR, so there
is no need to call it from driver side.

After that do remap new queues to correct vector. At the end start all
representors (napi enable, start queues, add slow path rule).

Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 86197ad5 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: set Tx topology every time new repr is added

It is needed to track correct Tx topology. Update it every time new
representor is created or remove node in case of removing corresponding
representor.

Still clear all node when removing switchdev mode as part of Tx topology
isn't related only to representors. Also clear ::rate_note value to
prevent skipping this node next time Tx topology is created.

Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# deb53f20 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: return pointer to representor

In follow up patches it will be easier to obtain created port
representor pointer instead of the id. Without it the pattern from
eswitch side will look like:
- create PR
- get PR based on the id

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 604283e9 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: make representor code generic

Representor code needs to be independent from specific device type, like
in this case VF. Make generic add / remove representor function and
specific add VF / rem VF function. New device types will follow this
scheme.

In bridge offload code there is a need to get representor pointer based
on VSI. Implement helper function to achieve that.

Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e4c46abc 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: remove VF pointer reference in eswitch code

Make eswitch code generic by removing VF pointer reference in functions.
It is needed to support eswitch mode for other type of devices.

Previously queue id used for Rx was based on VF number. Use ::q_id saved
in port representor instead.

After adding or removing port representor ::q_id value can change. It
isn't good idea to iterate over representors list using this value.
Use xa_find starting from the first one instead to get next port
representor to remap.

The number of port representors has to be equal to ::num_rx/tx_q. Warn if
it isn't true.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# af41b185 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: track port representors in xarray

Instead of assuming that each VF has pointer to port representor save it
in xarray. It will allow adding port representor for other device types.

Drop reference to VF where it is use only to get port representor. Get
it from xarray instead.

The functions will no longer by specific for VF, rename them.

Track id assigned by xarray in port representor structure. The id can't
be used as ::q_id, because it is fixed during port representor lifetime.
::q_id can change after adding / removing other port representors.

Side effect of removing VF pointer is that we are losing VF MAC
information used in unrolling. Store it in port representor as parent
MAC.

Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 5c53c122 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: use repr instead of vf->repr

Extract repr from vf->repr as it is often use in the ice_repr_rem().

Remove meaningless clearing of q_vector and netdev pointers as kfree is
called on repr pointer.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 7c37bf99 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: track q_id in representor

Previously queue index of control plane VSI used by port representor was
always id of VF. If we want to allow adding port representors for
different devices we have to track queue index in the port representor
structure.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f6e8fb55 12-Jul-2023 Wojciech Drewek <wojciech.drewek@intel.com>

ice: Implement basic eswitch bridge setup

With this patch, ice driver is able to track if the port
representors or uplink port were added to the linux bridge in
switchdev mode. Listen for NETDEV_CHANGEUPPER events in order to
detect this. ice_esw_br data structure reflects the linux bridge
and stores all the ports of the bridge (ice_esw_br_port) in
xarray, it's created when the first port is added to the bridge and
freed once the last port is removed. Note that only one bridge is
supported per eswitch.

Bridge port (ice_esw_br_port) can be either a VF port representor
port or uplink port (ice_esw_br_port_type). In both cases bridge port
holds a reference to the VSI, VF's VSI in case of the PR and uplink
VSI in case of the uplink. VSI's index is used as an index to the
xarray in which ports are stored.

Add a check which prevents configuring switchdev mode if uplink is
already added to any bridge. This is needed because we need to listen
for NETDEV_CHANGEUPPER events to record if the uplink was added to
the bridge. Netdevice notifier is registered after eswitch mode
is changed to switchdev.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0ef4479d 07-Apr-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: use src VSI instead of src MAC in slow-path

The use of a source MAC to direct packets from the VF to the corresponding
port representor is only ok if there is only one MAC on a VF. To support
this functionality when the number of MACs on a VF is greater, it is
necessary to match a source VSI instead of a source MAC.

Let's use the new switch API that allows matching on metadata.

If MAC isn't used in match criteria there is no need to handle adding
rule after virtchnl command. Instead add new rule while port representor
is being configured.

Remove rule_added field, checking for sp_rule can be used instead.
Remove also checking for switchdev running in deleting rule as it can be
called from unroll context when running flag isn't set. Checking for
sp_rule covers both context (with and without running flag).

Rules are added in eswitch configuration flow, so there is no need to
have replay function.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 288ecf49 18-Nov-2022 Benjamin Mikailenko <benjamin.mikailenko@intel.com>

ice: Accumulate ring statistics over reset

Resets may occur with or without user interaction. For example, a TX hang
or reconfiguration of parameters will result in a reset. During reset, the
VSI is freed, freeing any statistics structures inside as well. This would
create an issue for the user where a reset happens in the background,
statistics set to zero, and the user checks ring statistics expecting them
to be populated.

To ensure this doesn't happen, accumulate ring statistics over reset.

Define a new ring statistics structure, ice_ring_stats. The new structure
lives in the VSI's parent, preserving ring statistics when VSI is freed.

1. Define a new structure vsi_ring_stats in the PF scope
2. Allocate/free stats only during probe, unload, or change in ring size
3. Replace previous ring statistics functionality with new structure

Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 80fe30a8 15-Nov-2022 Michal Wilczynski <michal.wilczynski@intel.com>

ice: Prevent ADQ, DCB coexistence with Custom Tx scheduler

ADQ, DCB might interfere with Custom Tx Scheduler changes that user
might introduce using devlink-rate API.

Check if ADQ, DCB is active, when user tries to change any setting
in exported Tx scheduler tree. If any of those are active block the user
from doing so, and log an appropriate message.

Remove the exported hierarchy if user enable ADQ or DCB.
Prevent ADQ or DCB from getting configured if user already made some
changes using devlink-rate API.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 42c2eb6b 15-Nov-2022 Michal Wilczynski <michal.wilczynski@intel.com>

ice: Implement devlink-rate API

There is a need to support modification of Tx scheduler tree, in the
ice driver. This will allow user to control Tx settings of each node in
the internal hierarchy of nodes. As a result user will be able to use
Hierarchy QoS implemented entirely in the hardware.

This patch implemenents devlink-rate API. It also exports initial
default hierarchy. It's mostly dictated by the fact that the tree
can't be removed entirely, all we can do is enable the user to modify
it. For example root node shouldn't ever be removed, also nodes that
have children are off-limits.

Example initial tree with 2 VF's:

[root@fedora ~]# devlink port function rate show

pci/0000:4b:00.0/node_27: type node parent node_26
pci/0000:4b:00.0/node_26: type node parent node_0
pci/0000:4b:00.0/node_34: type node parent node_33
pci/0000:4b:00.0/node_33: type node parent node_32
pci/0000:4b:00.0/node_32: type node parent node_16
pci/0000:4b:00.0/node_19: type node parent node_18
pci/0000:4b:00.0/node_18: type node parent node_17
pci/0000:4b:00.0/node_17: type node parent node_16
pci/0000:4b:00.0/node_21: type node parent node_20
pci/0000:4b:00.0/node_20: type node parent node_3
pci/0000:4b:00.0/node_14: type node parent node_5
pci/0000:4b:00.0/node_5: type node parent node_3
pci/0000:4b:00.0/node_13: type node parent node_4
pci/0000:4b:00.0/node_12: type node parent node_4
pci/0000:4b:00.0/node_11: type node parent node_4
pci/0000:4b:00.0/node_10: type node parent node_4
pci/0000:4b:00.0/node_9: type node parent node_4
pci/0000:4b:00.0/node_8: type node parent node_4
pci/0000:4b:00.0/node_7: type node parent node_4
pci/0000:4b:00.0/node_6: type node parent node_4
pci/0000:4b:00.0/node_4: type node parent node_3
pci/0000:4b:00.0/node_3: type node parent node_16
pci/0000:4b:00.0/node_16: type node parent node_15
pci/0000:4b:00.0/node_15: type node parent node_0
pci/0000:4b:00.0/node_2: type node parent node_1
pci/0000:4b:00.0/node_1: type node parent node_0
pci/0000:4b:00.0/node_0: type node
pci/0000:4b:00.0/1: type leaf parent node_27
pci/0000:4b:00.0/2: type leaf parent node_27

Let me visualize part of the tree:

+---------+
| node_0 |
+---------+
|
+----v----+
| node_26 |
+----+----+
|
+----v----+
| node_27 |
+----+----+
|
|-----------------|
+----v----+ +----v----+
| VF 1 | | VF 2 |
+----+----+ +----+----+

So at this point there is a couple things that can be done.
For example we could only assign parameters to VF's.

[root@fedora ~]# devlink port function rate set pci/0000:4b:00.0/1 \
tx_max 5Gbps

This would cap the VF 1 BW to 5Gbps.

But let's say you would like to create a completely new branch.
This can be done like this:

[root@fedora ~]# devlink port function rate add \
pci/0000:4b:00.0/node_custom parent node_0
[root@fedora ~]# devlink port function rate add \
pci/0000:4b:00.0/node_custom_1 parent node_custom
[root@fedora ~]# devlink port function rate set \
pci/0000:4b:00.0/1 parent node_custom_1

This creates a completely new branch and reassigns VF 1 to it.

A number of parameters is supported per each node: tx_max, tx_share,
tx_priority and tx_weight.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 77df1db8 02-Nov-2022 Jiri Pirko <jiri@nvidia.com>

net: remove unused ndo_get_devlink_port

Remove ndo_get_devlink_port which is no longer used alongside with the
implementations in drivers.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# ac73d4bf 02-Nov-2022 Jiri Pirko <jiri@nvidia.com>

net: make drivers to use SET_NETDEV_DEVLINK_PORT to set devlink_port

Benefit from the previously implemented tracking of netdev events in
devlink code and instead of calling devlink_port_type_eth_set() and
devlink_port_type_clear() to set devlink port type and link to related
netdev, use SET_NETDEV_DEVLINK_PORT() macro to assign devlink_port
pointer to netdevice which is about to be registered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# a286ba73 26-Sep-2022 Jiri Pirko <jiri@nvidia.com>

ice: reorder PF/representor devlink port register/unregister flows

Make sure that netdevice is registered/unregistered while devlink port
is registered.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 6384b769 28-Mar-2022 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: link representors to PCI device

Link port representors to parent PCI device to benefit from
systemd defined naming scheme.

Example from ip tool:
- without linking:
eth0 ...
- with linking:
eth0 ...
altname enp24s0f0npf0vf0

The port representor name is being shown in altname, because the name is
longer than IFNAMSIZ (16) limit. Altname can be used in ip tool.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# baeb705f 11-Apr-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: always check VF VSI pointer values

The ice_get_vf_vsi function can return NULL in some cases, such as if
handling messages during a reset where the VSI is being removed and
recreated.

Several places throughout the driver do not bother to check whether this
VSI pointer is valid. Static analysis tools maybe report issues because
they detect paths where a potentially NULL pointer could be dereferenced.

Fix this by checking the return value of ice_get_vf_vsi everywhere.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# a7e11710 22-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: convert vf->vc_ops to a const pointer

The vc_ops structure is used to allow different handlers for virtchnl
commands when the driver is in representor mode. The current
implementation uses a copy of the ops table in each VF, and modifies
this copy dynamically.

The usual practice in kernel code is to store the ops table in a
constant structure and point to different versions. This has a number of
advantages:

1. Reduced memory usage. Each VF merely points to the correct table,
so they're able to re-use the same constant lookup table in memory.
2. Consistency. It becomes more difficult to accidentally update or
edit only one op call. Instead, the code switches to the correct
able by a single pointer write. In general this is atomic, either
the pointer is updated or its not.
3. Code Layout. The VF structure can store a pointer to the table
without needing to have the full structure definition defined prior
to the VF structure definition. This will aid in future refactoring
of code by allowing the VF pointer to be kept in ice_vf_lib.h while
the virtchnl ops table can be maintained in ice_virtchnl.h

There is one major downside in the case of the vc_ops structure. Most of
the operations in the table are the same between the two current
implementations. This can appear to lead to duplication since each
implementation must now fill in the complete table. It could make
spotting the differences in the representor mode more challenging.
Unfortunately, methods to make this less error prone either add
complexity overhead (macros using CPP token concatenation) or don't work
on all compilers we support (constant initializer from another constant
structure).

The cost of maintaining two structures does not out weigh the benefits
of the constant table model.

While we're making these changes, go ahead and rename the structure and
implementations with "virtchnl" instead of "vc_vf_". This will more
closely align with the planned file renaming, and avoid similar names when
we later introduce a "vf ops" table for separating Scalable IOV and
Single Root IOV implementations.

Leave the accessor/assignment functions in order to avoid issues with
compiling with options disabled. The interface makes it easier to handle
when CONFIG_PCI_IOV is disabled in the kernel.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0deb0bf7 22-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: rename ice_virtchnl_pf.c to ice_sriov.c

The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
code for implementing Single Root IOV virtualization resides. This code
includes support for bringing up and tearing down VFs, hooks into the
kernel SR-IOV netdev operations, and for handling virtchnl messages from
VFs.

In the future, we plan to support Scalable IOV in addition to Single
Root IOV as an alternative virtualization scheme. This implementation
will re-use some but not all of the code in ice_virtchnl_pf.c

To prepare for this future, we want to refactor and split up the code in
ice_virtchnl_pf.c into the following scheme:

* ice_vf_lib.[ch]

Basic VF structures and accessors. This is where scheme-independent
code will reside.

* ice_virtchnl.[ch]

Virtchnl message handling. This is where the bulk of the logic for
processing messages from VFs using the virtchnl messaging scheme will
reside. This is separated from ice_vf_lib.c because it is distinct
and has a bulk of the processing code.

* ice_sriov.[ch]

Single Root IOV implementation, including initialization and the
routines for interacting with SR-IOV based netdev operations.

* (future) ice_siov.[ch]

Scalable IOV implementation.

As a first step, lets assume that all of the code in
ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
ice_sriov.c and its header to ice_sriov.h

Future changes will further split out the code in these files following
the plan outlined here.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c8ff29b5 27-Jan-2022 Marcin Szycik <marcin.szycik@linux.intel.com>

ice: Add slow path offload stats on port representor in switchdev

Implement callbacks to check for stats and fetch port representor stats.
Stats are taken from RX/TX ring corresponding to port representor and show
the number of bytes/packets that were not offloaded.

To see slow path stats run:
ifstat -x cpu_hits -a

Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 3d5985a1 16-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: convert VF storage to hash table with krefs and RCU

The ice driver stores VF structures in a simple array which is allocated
once at the time of VF creation. The VF structures are then accessed
from the array by their VF ID. The ID must be between 0 and the number
of allocated VFs.

Multiple threads can access this table:

* .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
* interrupts, such as due to messages from the VF using the virtchnl
communication
* processing such as device reset
* commands to add or remove VFs

The current implementation does not keep track of when all threads are
done operating on a VF and can potentially result in use-after-free
issues caused by one thread accessing a VF structure after it has been
released when removing VFs. Some of these are prevented with various
state flags and checks.

In addition, this structure is quite static and does not support a
planned future where virtualization can be more dynamic. As we begin to
look at supporting Scalable IOV with the ice driver (as opposed to just
supporting Single Root IOV), this structure is not sufficient.

In the future, VFs will be able to be added and removed individually and
dynamically.

To allow for this, and to better protect against a whole class of
use-after-free bugs, replace the VF storage with a combination of a hash
table and krefs to reference track all of the accesses to VFs through
the hash table.

A hash table still allows efficient look up of the VF given its ID, but
also allows adding and removing VFs. It does not require contiguous VF
IDs.

The use of krefs allows the cleanup of the VF memory to be delayed until
after all threads have released their reference (by calling ice_put_vf).

To prevent corruption of the hash table, a combination of RCU and the
mutex table_lock are used. Addition and removal from the hash table use
the RCU-aware hash macros. This allows simple read-only look ups that
iterate to locate a single VF can be fast using RCU. Accesses which
modify the hash table, or which can't take RCU because they sleep, will
hold the mutex lock.

By using this design, we have a stronger guarantee that the VF structure
can't be released until after all threads are finished operating on it.
We also pave the way for the more dynamic Scalable IOV implementation in
the future.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c4c2c7db 16-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: convert ice_for_each_vf to include VF entry iterator

The ice_for_each_vf macro is intended to be used to loop over all VFs.
The current implementation relies on an iterator that is the index into
the VF array in the PF structure. This forces all users to perform a
look up themselves.

This abstraction forces a lot of duplicate work on callers and leaks the
interface implementation to the caller. Replace this with an
implementation that includes the VF pointer the primary iterator. This
version simplifies callers which just want to iterate over every VF, as
they no longer need to perform their own lookup.

The "i" iterator value is replaced with a new unsigned int "bkt"
parameter, as this will match the necessary interface for replacing
the VF array with a hash table. For now, the bkt is the VF ID, but in
the future it will simply be the hash bucket index. Document that it
should not be treated as a VF ID.

This change aims to simplify switching from the array to a hash table. I
considered alternative implementations such as an xarray but decided
that the hash table was the simplest and most suitable implementation. I
also looked at methods to hide the bkt iterator entirely, but I couldn't
come up with a feasible solution that worked for hash table iterators.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# df830543 16-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: refactor unwind cleanup in eswitch mode

The code for supporting eswitch mode and port representors on VFs uses
an unwind based cleanup flow when handling errors.

These flows are used to cleanup and get everything back to the state
prior to attempting to switch from legacy to representor mode or back.

The unwind iterations make sense, but complicate a plan to refactor the
VF array structure. In the future we won't have a clean method of
reversing an iteration of the VFs.

Instead, we can change the cleanup flow to just iterate over all VF
structures and clean up appropriately.

First notice that ice_repr_add_for_all_vfs and ice_repr_rem_from_all_vfs
have an additional step of re-assigning the VC ops. There is no good
reason to do this outside of ice_repr_add and ice_repr_rem. It can
simply be done as the last step of these functions.

Second, make sure ice_repr_rem is safe to call on a VF which does not
have a representor. Check if vf->repr is NULL first and exit early if
so.

Move ice_repr_rem_from_all_vfs above ice_repr_add_for_all_vfs so that we
can call it from the cleanup function.

In ice_eswitch.c, replace the unwind iteration with a call to
ice_eswitch_release_reprs. This will go through all of the VFs and
revert the VF back to the standard model without the eswitch mode.

To make this safe, ensure this function checks whether or not the
represent or has been moved. Rely on the metadata destination in
vf->repr->dst. This must be NULL if the representor has not been moved
to eswitch mode.

Ensure that we always re-assign this value back to NULL after freeing
it, and move the ice_eswitch_release_reprs so that it can be called from
the setup function.

With these changes, eswitch cleanup no longer uses an unwind flow that
is problematic for the planned VF data structure change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c1e5da5d 25-Oct-2021 Wojciech Drewek <wojciech.drewek@intel.com>

ice: improve switchdev's slow-path

In current switchdev implementation, every VF PR is assigned to
individual ring on switchdev ctrl VSI. For slow-path traffic, there
is a mapping VF->ring done in software based on src_vsi value (by
calling ice_eswitch_get_target_netdev function).

With this change, HW solution is introduced which is more
efficient. For each VF, src MAC (VF's MAC) filter will be created,
which forwards packets to the corresponding switchdev ctrl VSI queue
based on src MAC address.

This filter has to be removed and then replayed in case of
resetting one VF. Keep information about this rule in repr->mac_rule,
thanks to that we know which rule has to be removed and replayed
for a given VF.

In case of CORE/GLOBAL all rules are removed
automatically. We have to take care of readding them. This is done
by ice_replay_vsi_adv_rule.

When driver leaves switchdev mode, remove all advanced rules
from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi.

Flag repr->rule_added is needed because in some cases reset
might be triggered before VF sends request to add MAC.

Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e984c440 18-Oct-2021 Marcin Szycik <marcin.szycik@intel.com>

ice: Add support for changing MTU on PR in switchdev mode

This change adds support for changing MTU on port representor in
switchdev mode, by setting the min/max MTU values on port representor
netdev. Before it was possible to change the MTU only in a limited,
default range (68-1500).

Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 7fde6d8b 06-Aug-2021 Michal Swiatkowski <michal.swiatkowski@intel.com>

ice: ndo_setup_tc implementation for PR

Add tc-flower support for VF port representor devices.

Implement ndo_setup_tc callback for TC HW offload on VF port representors
devices. Implemented both methods: add and delete tc-flower flows.

Mark NETIF_F_HW_TC bit in net device's feature set to enable offload TC
infrastructure for port representor.

Implement TC filters replay function required to restore filters settings
while switchdev configuration is rebuilt.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 7aae80ce 19-Aug-2021 Wojciech Drewek <wojciech.drewek@intel.com>

ice: add port representor ethtool ops and stats

Introduce the following ethtool operations for VF's representor:
-get_drvinfo
-get_strings
-get_ethtool_stats
-get_sset_count
-get_link

In all cases, existing operations were used with minor
changes which allow us to detect if ethtool op was called for
representor. Only VF VSI stats will be available for representor.

Implement ndo_get_stats64 for port representor. This will update
VF VSI stats and read them.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f5396b8a 19-Aug-2021 Grzegorz Nitka <grzegorz.nitka@intel.com>

ice: switchdev slow path

Slow path means allowing packet to go from uplink to representor
and from representor to correct VF on Rx site and from VF to
representor and to uplink on Tx site.

To accomplish this driver, has to set correct Tx descriptor. When
packet is sent from representor to VF, destination should be
set to VF VSI. When packet is sent from uplink port destination
should be uplink to bypass switch infrastructure and send packet
outside.

On Rx site driver should check source VSI field from Rx descriptor
and based on that forward packed to correct netdev. To allow
this there is a target netdevs table in control plane VSI
struct.

Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# b3be918d 19-Aug-2021 Grzegorz Nitka <grzegorz.nitka@intel.com>

ice: rebuild switchdev when resetting all VFs

As resetting all VFs behaves mostly like creating new VFs also
eswitch infrastructure has to be recreated. The easiest way to
do that is to rebuild eswitch after resetting VFs.

Implement helper functions to start and stop all representors
queues. This is used to disable traffic on port representors.

In rebuild path:
- NAPI has to be disabled
- eswitch environment has to be set up
- new port representors have to be created, because the old
one had pointer to not existing VFs
- new control plane VSI ring should be remapped
- NAPI hast to be enabled
- rxdid has to be set to FLEX_NIC_2, because this descriptor id
support source_vsi, which is needed on control plane VSI queues
- port representors queues have to be started

Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1a1c40df 19-Aug-2021 Grzegorz Nitka <grzegorz.nitka@intel.com>

ice: set and release switchdev environment

Switchdev environment has to be set up when user create VFs
and eswitch mode is switchdev. Release is done when user
delete all VFs.

Data path in this implementation is based on control plane VSI.
This VSI is used to pass traffic from port representors to
corresponding VFs and vice versa. Default TX rule has to be
added to forward packet to control plane VSI. This will redirect
packets from VFs which don't match other rules to control plane
VSI.

On RX side default rule is added on uplink VSI to receive all
traffic that doesn't match other rules. When setting switchdev
environment all other rules from VFs should be removed. Packet to
VFs will be forwarded by control plane VSI.

As VF without any mac rules can't send any packet because of
antispoof mechanism, VSI antispoof should be turned off on each VFs.

To send packet from representor to correct VSI, destination VSI
field in TX descriptor will have to be filled. Allow that by
setting destination override bit in control plane VSI security config.

Packet from VFs will be received on control plane VSI. Driver
should decide to which netdev forward the packet. Decision is
made based on src_vsi field from descriptor. There is a target
netdev list in control plane VSI struct which choose netdev
based on src_vsi number.

Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# ac19e03e 19-Aug-2021 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: allow process VF opcodes in different ways

In switchdev driver shouldn't add MAC, VLAN and promisc
filters on iavf demand but should return success to not
break normal iavf flow.

Achieve that by creating table of functions pointer with
default functions used to parse iavf command. While parse
iavf command, call correct function from table instead of
calling function direct.

When port representors are being created change functions
in table to new one that behaves correctly for switchdev
puprose (ignoring new filters).

Change back to default ops when representors are being
removed.

Co-developed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 37165e3f 19-Aug-2021 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: introduce VF port representor

Port representor is used to manage VF from host side. To allow
it each created representor registers netdevice with random hw
address. Also devlink port is created for all representors.

Port representor name is created based on switch id or managed
by devlink core if devlink port was registered with success.

Open and stop ndo ops are implemented to allow managing the VF
link state. Link state is tracked in VF struct.

Struct ice_netdev_priv is extended by pointer to representor
field. This is needed to get correct representor from netdev
struct mostly used in ndo calls.

Implement helper functions to check if given netdev is netdev of
port representor (ice_is_port_repr_netdev) and to get representor
from netdev (ice_netdev_to_repr).

As driver mostly will create or destroy port representors on all
VFs instead of on single one, write functions to add and remove
representor for each VF.

Representor struct contains pointer to source VSI, which is VSI
configured on VF, backpointer to VF, backpointer to netdev,
q_vector pointer and metadata_dst which will be used in data path.

Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>