History log of /linux-master/drivers/net/ethernet/intel/ice/ice.h
Revision Date Author Comments
# 500d0df5 05-Feb-2024 Wojciech Drewek <wojciech.drewek@intel.com>

ice: Fix debugfs with devlink reload

During devlink reload it is needed to remove debugfs entries
correlated with only one PF. ice_debugfs_exit() removes all
entries created by ice driver so we can't use it.

Introduce ice_debugfs_pf_deinit() in order to release PF's
debugfs entries. Move ice_debugfs_exit() call to ice_module_exit(),
it makes more sense since ice_debugfs_init() is called in
ice_module_init() and not in ice_probe().

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 41cc4e53 05-Feb-2024 Wojciech Drewek <wojciech.drewek@intel.com>

ice: Remove and readd netdev during devlink reload

Recent changes to the devlink reload (commit 9b2348e2d6c9
("devlink: warn about existing entities during reload-reinit"))
force the drivers to destroy devlink ports during reinit.
Adjust ice driver to this requirement, unregister netdvice, destroy
devlink port. ice_init_eth() was removed and all the common code
between probe and reload was moved to ice_load().

During devlink reload we can't take devl_lock (it's already taken)
and in ice_probe() we have to lock it. Use devl_* variant of the API
which does not acquire and release devl_lock. Guard ice_load()
with devl_lock only in case of probe.

Suggested-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0ca6755f 03-Jan-2024 Aniruddha Paul <aniruddha.paul@intel.com>

ice: Add a new counter for Rx EIPE errors

HW incorrectly reports EIPE errors on encapsulated packets
with L2 padding inside inner packet. HW shows outer UDP/IPV4
packet checksum errors as part of the EIPE flags of the
Rx descriptor. These are reported only if checksum offload
is enabled and L3/L4 parsed flag is valid in Rx descriptor.

When that error is reported by HW, we don't act on it
instead of incrementing main Rx errors statistic as it
would normally happen.

Add a new statistic to count these errors since we still want
to print them.

Signed-off-by: Aniruddha Paul <aniruddha.paul@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jan Glaza <jan.glaza@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 8293e4cb 25-Jan-2024 Jacob Keller <jacob.e.keller@intel.com>

ice: introduce PTP state machine

Add PTP state machine so that the driver can correctly identify PTP
state around resets.
When the driver got information about ungraceful reset, PTP was not
prepared for reset and it returned error. When this situation occurs,
prepare PTP before rebuilding its structures.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 82e71b22 29-Nov-2023 Karol Kolacinski <karol.kolacinski@intel.com>

ice: Enable SW interrupt from FW for LL TS

Introduce new capability - Low Latency Timestamping with Interrupt.
On supported devices, driver can request a single timestamp from FW
without polling the register afterwards. Instead, FW can issue
a dedicated interrupt when the timestamp was read from the PHY register
and its value is available to read from the register.
This eliminates the need of bottom half scheduling, which results in
minimal delay for timestamping.

For this mode, allocate TS indices sequentially, so that timestamps are
always completed in FIFO manner.

Co-developed-by: Yochai Hagvi <yochai.hagvi@intel.com>
Signed-off-by: Yochai Hagvi <yochai.hagvi@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 00d50001 29-Nov-2023 Karol Kolacinski <karol.kolacinski@intel.com>

ice: Schedule service task in IRQ top half

Schedule service task and EXTTS in the top half to avoid bottom half
scheduling if possible, which significantly reduces timestamping delay.

Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 96a9a934 12-Dec-2023 Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

ice: configure FW logging

Users want the ability to debug FW issues by retrieving the
FW logs from the E8xx devices. Use debugfs to allow the user to
configure the log level and number of messages for FW logging.

If FW logging is supported on the E8xx then the file 'fwlog' will be
created under the PCI device ID for the ice driver. If the file does not
exist then either the E8xx doesn't support FW logging or debugfs is not
enabled on the system.

One thing users want to do is control which events are reported. The
user can read and write the 'fwlog/modules/<module name>' to get/set
the log levels. Each module in the FW that supports logging ht as a file
under 'fwlog/modules' that supports reading (to see what the current log
level is) and writing (to change the log level).

The format to set the log levels for a module are:

# echo <log level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/<module>

The supported log levels are:

* none
* error
* warning
* normal
* verbose

Each level includes the messages from the previous/lower level

The modules that are supported are:

* general
* ctrl
* link
* link_topo
* dnl
* i2c
* sdp
* mdio
* adminq
* hdma
* lldp
* dcbx
* dcb
* xlr
* nvm
* auth
* vpd
* iosf
* parser
* sw
* scheduler
* txq
* rsvd
* post
* watchdog
* task_dispatch
* mng
* synce
* health
* tsdrv
* pfreg
* mdlver
* all

The module 'all' is a special module which allows the user to read or
write to all of the modules.

The following example command would set the DCB module to the 'normal'
log level:

# echo normal > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb

If the user wants to set the DCB, Link, and the AdminQ modules to
'verbose' then the commands are:

# echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb
# echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/link
# echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/adminq

If the user wants to set all modules to the 'warning' level then the
command is:

# echo warning > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all

If the user wants to disable logging for a module then they can set the
level to 'none'. An example setting the 'watchdog' module is:

# echo none > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/watchdog

If the user wants to see what the log level is for a specific module
then the command is:

# cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb

This will return the log level for the DCB module. If the user wants to
see the log level for all the modules then the command is:

# cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all

Writing to the module file will update the configuration, but NOT enable the
configuration (that is a separate command).

In addition to configuring the modules, the user can also configure the
number of log messages (nr_messages) to include in a single Admin Receive
Queue (ARQ) event.The range is 1-128 (1 means push every log message, 128
means push only when the max AQ command buffer is full). The suggested
value is 10.

To see/change the resolution the user can read/write the
'fwlog/nr_messages' file. An example changing the value to 50 is

# echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_messages

To see the current value of 'nr_messages' then the command is:

# cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_messages

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 352e9bf2 12-Dec-2023 Jeff Guo <jia.guo@intel.com>

ice: enable symmetric-xor RSS for Toeplitz hash function

Allow the user to set the symmetric Toeplitz hash function via:

# ethtool -X eth0 hfunc toeplitz symmetric-xor

All existing RSS configurations will be converted to symmetric unless they
have a non-symmetric field (other than IP src/dst and L4 src/dst ports)
used for hashing. The driver will reject a new RSS configuration if such
a field is requested.

The hash function in the E800 NICs is set per-VSI and a specific AQ
command is needed to modify the hash function. Use the AQ command to
enable setting the symmetric Toeplitz RSS hash function for any VSI
in the new ice_set_rss_hfunc().

When the Symmetric Toeplitz hash function is used, the hardware sets the
input set of the RSS (Toeplitz) algorithm to be the XOR of the fields
index by HSYMM and the fields index by the INSET registers. We use this
to create a symmetric hash by setting the HSYMM registers to point to
their counterparts in the INSET registers:

HSYMM [src_fv] = dst_fv;
HSYMM [dst_fv] = src_fv;

where src_fv and dst_fv are the indexes of the protocol's src and dst
fields.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Jeff Guo <jia.guo@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://lore.kernel.org/r/20231213003321.605376-8-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 9031d5f4 05-Dec-2023 Larysa Zaremba <larysa.zaremba@intel.com>

ice: Support HW timestamp hint

Use previously refactored code and create a function
that allows XDP code to read HW timestamp.

Also, introduce packet context, where hints-related data will be stored.
ice_xdp_buff contains only a pointer to this structure, to avoid copying it
in ZC mode later in the series.

HW timestamp is the first supported hint in the driver,
so also add xdp_metadata_ops.

Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20231205210847.28460-6-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>


# 4da71a77 01-Dec-2023 Konrad Knitter <konrad.knitter@intel.com>

ice: read internal temperature sensor

Since 4.30 firmware exposes internal thermal sensor reading via admin
queue commands. Expose those readouts via hwmon API when supported.

Datasheet:

Get Sensor Reading Command (Opcode: 0x0632)

+--------------------+--------+--------------------+-------------------------+
| Name | Bytes | Value | Remarks |
+--------------------+--------+--------------------+-------------------------+
| Flags | 1-0 | | |
| Opcode | 2-3 | 0x0632 | Command opcode |
| Datalen | 4-5 | 0 | No external buffer. |
| Return value | 6-7 | | Return value. |
| Cookie High | 8-11 | Cookie | |
| Cookie Low | 12-15 | Cookie | |
| Sensor | 16 | | 0x00: Internal temp |
| | | | 0x01-0xFF: Reserved. |
| Format | 17 | Requested response | Only 0x00 is supported. |
| | | format | 0x01-0xFF: Reserved. |
| Reserved | 18-23 | | |
| Data Address high | 24-27 | Response buffer | |
| | | address | |
| Data Address low | 28-31 | Response buffer | |
| | | address | |
+--------------------+--------+--------------------+-------------------------+

Get Sensor Reading Response (Opcode: 0x0632)

+--------------------+--------+--------------------+-------------------------+
| Name | Bytes | Value | Remarks |
+--------------------+--------+--------------------+-------------------------+
| Flags | 1-0 | | |
| Opcode | 2-3 | 0x0632 | Command opcode |
| Datalen | 4-5 | 0 | No external buffer |
| Return value | 6-7 | | Return value. |
| | | | EINVAL: Invalid |
| | | | parameters |
| | | | ENOENT: Unsupported |
| | | | sensor |
| | | | EIO: Sensor access |
| | | | error |
| Cookie High | 8-11 | Cookie | |
| Cookie Low | 12-15 | Cookie | |
| Sensor Reading | 16-23 | | Format of the reading |
| | | | is dependent on request |
| Data Address high | 24-27 | Response buffer | |
| | | address | |
| Data Address low | 28-31 | Response buffer | |
| | | address | |
+--------------------+--------+--------------------+-------------------------+

Sensor Reading for Sensor 0x00 (Internal Chip Temperature):

+--------------------+--------+--------------------+-------------------------+
| Name | Bytes | Value | Remarks |
+--------------------+--------+--------------------+-------------------------+
| Thermal Sensor | 0 | | Reading in degrees |
| reading | | | Celsius. Signed int8 |
| Warning High | 1 | | Warning High threshold |
| threshold | | | in degrees Celsius. |
| | | | Unsigned int8. |
| | | | 0xFF when unsupported |
| Critical High | 2 | | Critical High threshold |
| threshold | | | in degrees Celsius. |
| | | | Unsigned int8. |
| | | | 0xFF when unsupported |
| Fatal High | 3 | | Fatal High threshold |
| threshold | | | in degrees Celsius. |
| | | | Unsigned int8. |
| | | | 0xFF when unsupported |
| Reserved | 4-7 | | |
+--------------------+--------+--------------------+-------------------------+

Driver provides current reading from HW as well as device specific
thresholds for thermal alarm (Warning, Critical, Fatal) events.

$ sensors

Output
=========================================================
ice-pci-b100
Adapter: PCI adapter
temp1: +62.0°C (high = +95.0°C, crit = +105.0°C)
(emerg = +115.0°C)

Tested on Intel Corporation Ethernet Controller E810-C for SFP

Co-developed-by: Marcin Domagala <marcinx.domagala@intel.com>
Signed-off-by: Marcin Domagala <marcinx.domagala@intel.com>
Co-developed-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Konrad Knitter <konrad.knitter@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 19b39cae 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: reserve number of CP queues

Rebuilding CP VSI each time the PR is created drastically increase the
time of maximum VFs creation. Add function to reserve number of CP
queues to deal with this problem.

Use the same function to decrease number of queues in case of removing
VFs. Assume that caller of ice_eswitch_reserve_cp_queues() will also
call ice_eswitch_attach/detach() correct number of times.

Still one by one PR adding is handy for VF resetting routine.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# af41b185 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: track port representors in xarray

Instead of assuming that each VF has pointer to port representor save it
in xarray. It will allow adding port representor for other device types.

Drop reference to VF where it is use only to get port representor. Get
it from xarray instead.

The functions will no longer by specific for VF, rename them.

Track id assigned by xarray in port representor structure. The id can't
be used as ::q_id, because it is fixed during port representor lifetime.
::q_id can change after adding / removing other port representors.

Side effect of removing VF pointer is that we are losing VF MAC
information used in unrolling. Store it in port representor as parent
MAC.

Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 5a841e4e 24-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: rename switchdev to eswitch

Eswitch is used as a prefix for related functions. Main structure
storing all data related to eswitch should also be named as eswitch
instead of ice_switchdev_info. Rename it.

Also rename switchdev to eswitch where the context is not about eswitch
mode.

::uplink_netdev was changed to netdev for simplicity. There is no other
netdev in function scope so it is obvious.

Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# ea4af9b4 19-Oct-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: add bitmap to track VF MSI-X usage

Create a bitamp to track MSI-X usage for VFs. The bitmap has the size of
total MSI-X amount on device, because at init time the amount of MSI-X
used by VFs isn't known.

The bitmap is used in follow up patchset to provide a block of
continuous block of MSI-X indexes for each created VF.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 982b0192 15-Oct-2023 Pawel Chmielewski <pawel.chmielewski@intel.com>

ice: Refactor finding advertised link speed

Refactor ice_get_link_ksettings to using forced speed to link modes
mapping.

Suggested-by : Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# d938a8cc 27-Jul-2023 Michal Michalik <michal.michalik@intel.com>

ice: Auxbus devices & driver for E822 TS

There is a problem in HW in E822-based devices leading to race
condition.
It might happen that, in order:
- PF0 (which owns the PHC) requests few timestamps,
- PF1 requests a timestamp,
- interrupt is being triggered and both PF0 and PF1 threads are woken
up,
- PF0 got one timestamp, still waiting for others so not going to sleep,
- PF1 gets it's timestamp, process it and go to sleep,
- PF1 requests a timestamp again,
- just before PF0 goes to sleep timestamp of PF1 appear,
- PF0 finishes all it's timestamps and go to sleep (PF1 also sleeping).
That leaves PF1 timestamp memory not read, which lead to blocking the
next interrupt from arriving.

Fix it by adding auxiliary devices and only one driver to handle all the
timestamps for all PF's by PHC owner. In the past each PF requested it's
own timestamps and process it from the start till the end which causes
problem described above. Currently each PF requests the timestamps as
before, but the actual reading of the completed timestamps is being done
by the PTP auxiliary driver, which is registered by the PF which owns PHC.

Additionally, the newly introduced auxiliary driver/devices for PTP clock
owner will be used for other features in all products (including E810).

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 42d40bb2 08-Sep-2023 Jacob Keller <jacob.e.keller@intel.com>

ice: introduce ice_pf_src_tmr_owned

Add ice_pf_src_tmr_owned() macro to check the function capability bit
indicating if the current function owns the PTP hardware clock. This is
slightly shorter than the more verbose access via
hw.func_caps.ts_func_info.src_tmr_owned. Use this where possible rather
than open coding its equivalent.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 12a5a28b 16-Aug-2023 Jacob Keller <jacob.e.keller@intel.com>

ice: remove ICE_F_PTP_EXTTS feature flag

The ICE_F_PTP_EXTTS feature flag is ostensibly intended to support checking
whether the device supports external timestamp pins. It is only checked in
E810-specific code flows, and is enabled for all E810-based devices. E822
and E823 flows unconditionally enable external timestamp support.

This makes the feature flag meaningless, as it is always enabled. Just
unconditionally enable support for external timestamp pins and remove this
unnecessary flag.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d7999f5e 13-Sep-2023 Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>

ice: implement dpll interface to control cgu

Control over clock generation unit is required for further development
of Synchronous Ethernet feature. Interface provides ability to obtain
current state of a dpll, its sources and outputs which are pins, and
allows their configuration.

Co-developed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 8a3a565f 13-Sep-2023 Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>

ice: add admin commands to access cgu configuration

Add firmware admin command to access clock generation unit
configuration, it is required to enable Extended PTP and SyncE features
in the driver.
Add definitions of possible hardware variations of input and output pins
related to clock generation unit and functions to access the data.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# fb9840c4 08-Aug-2023 Przemek Kitszel <przemyslaw.kitszel@intel.com>

ice: split ice_aq_wait_for_event() func into two

Mitigate race between registering on wait list and receiving
AQ Response from FW.

ice_aq_prep_for_event() should be called before sending AQ command,
ice_aq_wait_for_event() should be called after sending AQ command,
to wait for AQ Response.

Please note, that this was found by reading the code,
an actual race has not yet materialized.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# b214b98a 08-Aug-2023 Przemek Kitszel <przemyslaw.kitszel@intel.com>

ice: embed &ice_rq_event_info event into struct ice_aq_task

Expose struct ice_aq_task to callers,
what takes burden of memory ownership out from AQ-wait family of functions,
and reduces need for heap-based allocations.

Embed struct ice_rq_event_info event into struct ice_aq_task
(instead of it being a ptr) to remove some more code from the callers.

Subsequent commit will improve more based on this one.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# bb52f42a 20-Jun-2023 Dave Ertman <david.m.ertman@intel.com>

ice: Add driver support for firmware changes for LAG

Add the defines, fields, and detection code for FW support of LAG for
SRIOV. Also exposes some previously static functions to allow access
in the lag code.

Clean up code that is unused or not needed for LAG support. Also add
an ordered workqueue for processing LAG events.

Reviewed-by: Daniel Machon <daniel.machon@microchip.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 2946204b 12-Jul-2023 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: implement bridge port vlan

Port VLAN in this case means push and pop VLAN action on specific vid.
There are a few limitation in hardware:
- push and pop can't be used separately
- if port VLAN is used there can't be any trunk VLANs, because pop
action is done on all traffic received by VSI in port VLAN mode
- port VLAN mode on uplink port isn't supported

Reflect these limitations in code using dev_info to inform the user
about unsupported configuration.

In bridge mode there is a need to configure port vlan without resetting
VFs. To do that implement ice_port_vlan_on/off() functions. They are
only configuring correct vlan_ops to allow setting port vlan.

We also need to clear port vlan without resetting the VF which is not
supported right now. Change it by implementing clear_port_vlan ops.
As previous VLAN configuration isn't always the same, store current
config while creating port vlan and restore it in clear function.

Configuration steps:
- configure switchdev with bridge
- #bridge vlan add dev eth0 vid 120 pvid untagged
- #bridge vlan add dev eth1 vid 120 pvid untagged
- ping from VF0 to VF1

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f6e8fb55 12-Jul-2023 Wojciech Drewek <wojciech.drewek@intel.com>

ice: Implement basic eswitch bridge setup

With this patch, ice driver is able to track if the port
representors or uplink port were added to the linux bridge in
switchdev mode. Listen for NETDEV_CHANGEUPPER events in order to
detect this. ice_esw_br data structure reflects the linux bridge
and stores all the ports of the bridge (ice_esw_br_port) in
xarray, it's created when the first port is added to the bridge and
freed once the last port is removed. Note that only one bridge is
supported per eswitch.

Bridge port (ice_esw_br_port) can be either a VF port representor
port or uplink port (ice_esw_br_port_type). In both cases bridge port
holds a reference to the VSI, VF's VSI in case of the PR and uplink
VSI in case of the uplink. VSI's index is used as an index to the
xarray in which ports are stored.

Add a check which prevents configuring switchdev mode if uplink is
already added to any bridge. This is needed because we need to listen
for NETDEV_CHANGEUPPER events to record if the uplink was added to
the bridge. Netdevice notifier is registered after eswitch mode
is changed to switchdev.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 6e8b2c88 01-Jun-2023 Karol Kolacinski <karol.kolacinski@intel.com>

ice: handle extts in the miscellaneous interrupt thread

The ice_ptp_extts_work() and ice_ptp_periodic_work() functions are both
scheduled on the same kthread worker, pf.ptp.kworker. The
ice_ptp_periodic_work() function sends to the firmware to interact with the
PHY, and must block to wait for responses.

This can cause delay in responding to the PFINT_OICR_TSYN_EVNT interrupt
cause, ultimately resulting in disruption to processing an input signal of
the frequency is high enough. In our testing, even 100 Hz signals get
disrupted.

Fix this by instead processing the signal inside the miscellaneous
interrupt thread prior to handling Tx timestamps.

Use atomic bits in a new pf->misc_thread bitmap in order to safely
communicate which tasks require processing within the
ice_misc_intr_thread_fn(). This ensures the communication of desired tasks
from the ice_misc_intr() are correctly processed without racing even in the
event that the interrupt triggers again before the thread function exits.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1c769b1a 16-May-2023 Dave Ertman <david.m.ertman@intel.com>

ice: Remove LAG+SRIOV mutual exclusion

There was a change previously to stop SR-IOV and LAG from existing on the
same interface. This was to prevent the violation of LACP (Link
Aggregation Control Protocol). The method to achieve this was to add a
no-op Rx handler onto the netdev when SR-IOV VFs were present, thus
blocking bonding, bridging, etc from claiming the interface by adding
its own Rx handler. Also, when an interface was added into a aggregate,
then the SR-IOV capability was set to false.

There are some users that have in house solutions using both SR-IOV and
bridging/bonding that this method interferes with (e.g. creating duplicate
VFs on the bonded interfaces and failing between them when the interface
fails over).

It makes more sense to provide the most functionality
possible, the restriction on co-existence of these features will be
removed. No additional functionality is currently being provided beyond
what existed before the co-existence restriction was put into place. It is
up to the end user to not implement a solution that would interfere with
existing network protocols.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 9136e1f1 26-Apr-2023 Paul Greenwalt <paul.greenwalt@intel.com>

ice: refactor PHY type to ethtool link mode

Refactor ice_phy_type_to_ethtool to use phy_type_[low|high]_lkup table to
map PHY type to AQ link speed and ethtool link mode. This removes
complexity and simplifies future changes.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 011670cc 15-May-2023 Piotr Raczynski <piotr.raczynski@intel.com>

ice: add dynamic interrupt allocation

Currently driver can only allocate interrupt vectors during init phase by
calling pci_alloc_irq_vectors. Change that and make use of new
pci_msix_alloc_irq_at/pci_msix_free_irq API and enable to allocate and free
more interrupts after MSIX has been enabled. Since not all platforms
supports dynamic allocation, check it with pci_msix_can_alloc_dyn.

Extend the tracker to keep track how many interrupts are allocated
initially so when all such vectors are already used, additional interrupts
are automatically allocated dynamically. Remember each interrupt allocation
method to then free appropriately. Since some features may require
interrupts allocated dynamically add appropriate VSI flag and take it into
account when allocating new interrupt.

Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# cfebc0a3 15-May-2023 Piotr Raczynski <piotr.raczynski@intel.com>

ice: track interrupt vectors with xarray

Replace custom interrupt tracker with generic xarray data structure.
Remove all code responsible for searching for a new entry with xa_alloc,
which always tries to allocate at the lowes possible index. As a result
driver is always using a contiguous region of the MSIX vector table.

New tracker keeps ice_irq_entry entries in xarray as opaque for the rest
of the driver hiding the entry details from the caller.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 4aad5335 15-May-2023 Piotr Raczynski <piotr.raczynski@intel.com>

ice: add individual interrupt allocation

Currently interrupt allocations, depending on a feature are distributed
in batches. Also, after allocation there is a series of operations that
distributes per irq settings through that batch of interrupts.

Although driver does not yet support dynamic interrupt allocation, keep
allocated interrupts in a pool and add allocation abstraction logic to
make code more flexible. Keep per interrupt information in the
ice_q_vector structure, which yields ice_vsi::base_vector redundant.
Also, as a result there are a few functions that can be removed.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 38e97a98 15-May-2023 Piotr Raczynski <piotr.raczynski@intel.com>

ice: move interrupt related code to separate file

Keep interrupt handling code in a dedicated file. This helps keep driver
structured better and prepares for more functionality added to this file.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# ddd652ef 06-Mar-2023 Bjorn Helgaas <bhelgaas@google.com>

ice: Remove unnecessary aer.h include

<linux/aer.h> is unused, so remove it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 248401cb 10-Mar-2023 Dave Ertman <david.m.ertman@intel.com>

ice: avoid bonding causing auxiliary plug/unplug under RTNL lock

RDMA is not supported in ice on a PF that has been added to a bonded
interface. To enforce this, when an interface enters a bond, we unplug
the auxiliary device that supports RDMA functionality. This unplug
currently happens in the context of handling the netdev bonding event.
This event is sent to the ice driver under RTNL context. This is causing
a deadlock where the RDMA driver is waiting for the RTNL lock to complete
the removal.

Defer the unplugging/re-plugging of the auxiliary device to the service
task so that it is not performed under the RTNL lock context.

Cc: stable@vger.kernel.org # 6.1.x
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Link: https://lore.kernel.org/netdev/CAK8fFZ6A_Gphw_3-QMGKEFQk=sfCw1Qmq0TVZK3rtAi7vb621A@mail.gmail.com/
Fixes: 5cb1ebdbc434 ("ice: Fix race condition during interface enslave")
Fixes: 4eace75e0853 ("RDMA/irdma: Report the correct link speed")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230310194833.3074601-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# fce92dbc 07-Feb-2023 Pawel Chmielewski <pawel.chmielewski@intel.com>

ice: add support BIG TCP on IPv6

Enable sending BIG TCP packets on IPv6 in the ice driver using generic
ipv6_hopopt_jumbo_remove helper for stripping HBH header.

Tested:
netperf -t TCP_RR -H 2001:db8:0:f101::1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,TRANSACTION_RATE

Tested on two different setups. In both cases, the following settings were
applied after loading the changed driver:

ip link set dev enp175s0f1np1 gso_max_size 130000
ip link set dev enp175s0f1np1 gro_max_size 130000
ip link set dev enp175s0f1np1 mtu 9000

First setup:
Before:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
134 279 410 3961.584

After:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
135 178 216 6093.404

The other setup:
Before:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
218 414 478 2944.765

After:
Minimum 90th 99th Transaction
Latency Percentile Percentile Rate
Microseconds Latency Latency Tran/s
Microseconds Microseconds
146 238 266 4700.596

Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 9adafe2b 04-Feb-2023 Vladimir Oltean <vladimir.oltean@nxp.com>

net/sched: move struct tc_mqprio_qopt_offload from pkt_cls.h to pkt_sched.h

Since mqprio is a scheduler and not a classifier, move its offload
structure to pkt_sched.h, where struct tc_taprio_qopt_offload also lies.

Also update some header inclusions in drivers that access this
structure, to the best of my abilities.

Cc: Igor Russkikh <irusskikh@marvell.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Horatiu Vultur <horatiu.vultur@microchip.com>
Cc: Lars Povlsen <lars.povlsen@microchip.com>
Cc: Steen Hegelund <Steen.Hegelund@microchip.com>
Cc: Daniel Machon <daniel.machon@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 5b246e53 20-Dec-2022 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: split probe into smaller functions

Part of code from probe can be reused in reload flow. Move this code to
separate function. Create unroll functions for each part of
initialization, like: ice_init_dev() and ice_deinit_dev(). It
simplifies unrolling and can be used in remove flow.

Avoid freeing port info as it could be reused in reload path.
Will be freed in remove path since is allocated via devm_kzalloc().

Also clean the remove path to reflect the init steps.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0db66d20 20-Dec-2022 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: cleanup in VSI config/deconfig code

Do few small cleanups:

1) Rename the function to reflect that it doesn't configure all things
related to VSI. ice_vsi_cfg_lan() better fits to what function is doing.

ice_vsi_cfg() can be use to name function that will configure whole VSI.

2) Remove unused ethtype field from VSI. There is no need to set
ethtype here, because it is never used.

3) Remove unnecessary check for ICE_VSI_CHNL. There is check for
ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before
calling the function.

4) Simplify ice_vsi_alloc() call. There is no need to check the type of
VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL
(ice_vsi_setup() is called with vf=NULL).
For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types
ch and vf are always NULL.

5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will
be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() ->
ice_vsi_dis_irq()). Remove unnecessary call.

6) Don't remove specific filters in release. All hw filters are removed
in ice_fltr_remove_alli(), which is always called in VSI release flow.
There is no need to remove only ethertype filters before calling
ice_fltr_remove_all().

7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only
free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free()
which better shows what function is doing.

8) Free coalesce param in rebuild. There is potential memory leak if
configuration of VSI lan fails. Free coalesce to avoid it.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 2b8db6af 20-Dec-2022 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: move RDMA init to ice_idc.c

Simplify probe flow by moving all RDMA related code to ice_init_rdma().
Unroll irq allocation if RDMA initialization fails.

Implement ice_deinit_rdma() and use it in remove flow.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Acked-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c7ef8221 18-Jan-2023 Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>

ice: use GNSS subsystem instead of TTY

Previously support for GNSS was implemented as a TTY driver, it allowed
to access GNSS receiver on /dev/ttyGNSS_<bus><func>.

Use generic GNSS subsystem API instead of implementing own TTY driver.
The receiver is accessible on /dev/gnss<id>. In case of multiple receivers
in the OS, correct device can be found by enumerating either:
- /sys/class/net/<eth port>/device/gnss/
- /sys/class/gnss/gnss<id>/device/

Using GNSS subsystem is superior to implementing own TTY driver, as the
GNSS subsystem was designed solely for this purpose. It also implements
TTY driver but in a common and defined way.

From user perspective, there is no difference in communicating with a
device, except new path to the device shall be used. The device will
provide same information to the userspace as the old one, and can be used
in the same way, i.e.:
old # gpsmon /dev/ttyGNSS_2100_0
new # gpsmon /dev/gnss0
There is no other impact on userspace tools.

User expecting onboard GNSS receiver support is required to enable
CONFIG_GNSS=y/m in kernel config.

Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>


# a6a0974a 24-Jan-2023 Dave Ertman <david.m.ertman@intel.com>

ice: Prevent set_channel from changing queues while RDMA active

The PF controls the set of queues that the RDMA auxiliary_driver requests
resources from. The set_channel command will alter that pool and trigger a
reconfiguration of the VSI, which breaks RDMA functionality.

Prevent set_channel from executing when RDMA driver bound to auxiliary
device.

Adding a locked variable to pass down the call chain to avoid double
locking the device_lock.

Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 288ecf49 18-Nov-2022 Benjamin Mikailenko <benjamin.mikailenko@intel.com>

ice: Accumulate ring statistics over reset

Resets may occur with or without user interaction. For example, a TX hang
or reconfiguration of parameters will result in a reset. During reset, the
VSI is freed, freeing any statistics structures inside as well. This would
create an issue for the user where a reset happens in the background,
statistics set to zero, and the user checks ring statistics expecting them
to be populated.

To ensure this doesn't happen, accumulate ring statistics over reset.

Define a new ring statistics structure, ice_ring_stats. The new structure
lives in the VSI's parent, preserving ring statistics when VSI is freed.

1. Define a new structure vsi_ring_stats in the PF scope
2. Allocate/free stats only during probe, unload, or change in ring size
3. Replace previous ring statistics functionality with new structure

Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 2fd5e433 18-Nov-2022 Benjamin Mikailenko <benjamin.mikailenko@intel.com>

ice: Accumulate HW and Netdev statistics over reset

Resets happen with or without user interaction. For example, incidents
such as TX hang or a reconfiguration of parameters will result in a reset.
During reset, hardware and software statistics were set to zero. This
created an issue for the user where a reset happens in the background,
statistics set to zero, and the user checks statistics expecting them to
be populated.

To ensure this doesn't happen, keep accumulating stats over reset.

1. Remove function calls which reset hardware and netdev statistics.
2. Do not rollover statistics in ice_stat_update40 during reset.

Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e753df8f 25-Oct-2022 Michal Jaron <michalx.jaron@intel.com>

ice: Add support Flex RXD

Add new VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC flag, opcode
VIRTCHNL_OP_GET_SUPPORTED_RXDIDS and add member rxdid
in struct virtchnl_rxq_info to support AVF Flex RXD
extension.

Add support to allow VF to query flexible descriptor RXDIDs supported
by DDP package and configure Rx queues with selected RXDID for IAVF.

Add code to allow VIRTCHNL_OP_GET_SUPPORTED_RXDIDS message to be
processed. Add necessary macros for registers.

Signed-off-by: Leyi Rong <leyi.rong@intel.com>
Signed-off-by: Xu Ting <ting.xu@intel.com>
Signed-off-by: Michal Jaron <michalx.jaron@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20221025161252.1952939-1-jacob.e.keller@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 143b86f3 21-Oct-2022 Amritha Nambiar <amritha.nambiar@intel.com>

ice: Enable RX queue selection using skbedit action

This patch uses TC skbedit queue_mapping action to support
forwarding packets to a device queue. Such filters with action
forward to queue will be the highest priority switch filter in
HW.
Example:
$ tc filter add dev ens4f0 protocol ip ingress flower\
dst_ip 192.168.1.12 ip_proto tcp dst_port 5001\
action skbedit queue_mapping 5 skip_sw

The above command adds an ingress filter, incoming packets
qualifying the match will be accepted into queue 5. The queue
number is in decimal format.

Refactored ice_add_tc_flower_adv_fltr() to consolidate code with
action FWD_TO_VSI and FWD_TO QUEUE.

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# dddd406d 27-Jul-2022 Jesse Brandeburg <jesse.brandeburg@intel.com>

ice: Implement control of FCS/CRC stripping

The driver can allow the user to configure whether the CRC aka the FCS
(Frame Check Sequence) is DMA'd to the host as part of the receive
buffer. The driver usually wants this feature disabled so that the
hardware checks the FCS and strips it in order to save PCI bandwidth.

Control the reception of FCS to the host using the command:
ethtool -K eth0 rx-fcs <on|off>

The default shown in ethtool -k eth0 | grep fcs; should be "off", as the
hardware will drop any frame with a bad checksum, and DMA of the
checksum is useless overhead especially for small packets.

Testing Hints:
test the FCS/CRC arrives with received packets using
tcpdump -nnpi eth0 -xxxx
and it should show crc data as the last 4 bytes of the packet. Can also
use wireshark to turn on CRC checking and check the data is correct.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 9ead7e74 11-Aug-2022 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: xsk: use Rx ring's XDP ring when picking NAPI context

Ice driver allocates per cpu XDP queues so that redirect path can safely
use smp_processor_id() as an index to the array. At the same time
though, XDP rings are used to pick NAPI context to call napi_schedule()
or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
num_possible_cpus() of underlying platform is 44, then this means queue
vectors with correlated NAPI contexts will carry several XDP queues.

This in turn can result in a broken behavior where NAPI context of
interest will never be scheduled and AF_XDP socket will not process any
traffic.

To fix this, let us change the way how XDP rings are assigned to Rx
rings and use this information later on when setting
ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
queue vector and walk through Tx ring's linked list. Once we stumble
upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.

Previous [0] approach of fixing this issue was for txonly scenario
because of the described grouping of XDP rings across queue vectors. So,
relying on Rx ring meant that NAPI context could be scheduled with a
queue vector without XDP ring with associated XSK pool.

[0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/

Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d7393425 04-Jul-2022 Michal Wilczynski <michal.wilczynski@intel.com>

ice: Introduce enabling promiscuous mode on multiple VF's

In current implementation default VSI switch filter is only able to
forward traffic to a single VSI. This limits promiscuous mode with
private flag 'vf-true-promisc-support' to a single VF. Enabling it on
the second VF won't work. Also allmulticast support doesn't seem to be
properly implemented when vf-true-promisc-support is true.

Use standard ice_add_rule_internal() function that already implements
forwarding to multiple VSI's instead of constructing AQ call manually.

Add switch filter for allmulticast mode when vf-true-promisc-support is
enabled. The same filter is added regardless of the flag - it doesn't
matter for this case.

Remove unnecessary fields in switch structure. From now on book keeping
will be done by ice_add_rule_internal().

Refactor unnecessarily passed function arguments.

To test:
1) Create 2 VM's, and two VF's. Attach VF's to VM's.
2) Enable promiscuous mode on both of them and check if
traffic is seen on both of them.

Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# cd8efeee 18-Jul-2022 Marcin Szycik <marcin.szycik@linux.intel.com>

ice: Add support for PPPoE hardware offload

Add support for creating PPPoE filters in switchdev mode. Add support
for parsing PPPoE and PPP-specific tc options: pppoe_sid and ppp_proto.

Example filter:
tc filter add dev $PF1 ingress protocol ppp_ses prio 1 flower pppoe_sid \
1234 ppp_proto ip skip_sw action mirred egress redirect dev $VF1_PR

Changes in iproute2 are required to use the new fields.

ICE COMMS DDP package is required to create a filter as it contains PPPoE
profiles. Added a warning message when loaded DDP package does not contain
required profiles.

Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d6b98c8d 24-Jun-2022 Karol Kolacinski <karol.kolacinski@intel.com>

ice: add write functionality for GNSS TTY

Add the possibility to write raw bytes to the GNSS module through the
first TTY device. This allows user to configure the module.

Create a second read-only TTY device.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 896a55aa 27-Jun-2022 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add EXTTS feature to the feature bitmap

External time stamp sources are supported only on certain devices. Enforce
the right support matrix by adding the ICE_F_PTP_EXTTS bit to the feature
bitmap set.

Co-developed-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 295819b5 23-Mar-2022 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: introduce common helper for retrieving VSI by vsi_num

Both ice_idc.c and ice_virtchnl.c carry their own implementation of a
helper function that is looking for a given VSI based on provided
vsi_num. Their functionality is the same, so let's introduce the common
function in ice.h that both of the mentioned sites will use.

This is a strictly cleanup thing, no functionality is changed.

Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 486b9eee 22-Apr-2022 Ivan Vecera <ivecera@redhat.com>

ice: Fix race during aux device (un)plugging

Function ice_plug_aux_dev() assigns pf->adev field too early prior
aux device initialization and on other side ice_unplug_aux_dev()
starts aux device deinit and at the end assigns NULL to pf->adev.
This is wrong because pf->adev should always be non-NULL only when
aux device is fully initialized and ready. This wrong order causes
a crash when ice_send_event_to_aux() call occurs because that function
depends on non-NULL value of pf->adev and does not assume that
aux device is half-initialized or half-destroyed.
After order correction the race window is tiny but it is still there,
as Leon mentioned and manipulation with pf->adev needs to be protected
by mutex.

Fix (un-)plugging functions so pf->adev field is set after aux device
init and prior aux device destroy and protect pf->adev assignment by
new mutex. This mutex is also held during ice_send_event_to_aux()
call to ensure that aux device is valid during that call.
Note that device lock used ice_send_event_to_aux() needs to be kept
to avoid race with aux drv unload.

Reproducer:
cycle=1
while :;do
echo "#### Cycle: $cycle"

ip link set ens7f0 mtu 9000
ip link add bond0 type bond mode 1 miimon 100
ip link set bond0 up
ifenslave bond0 ens7f0
ip link set bond0 mtu 9000
ethtool -L ens7f0 combined 1
ip link del bond0
ip link set ens7f0 mtu 1500
sleep 1

let cycle++
done

In short when the device is added/removed to/from bond the aux device
is unplugged/plugged. When MTU of the device is changed an event is
sent to aux device asynchronously. This can race with (un)plugging
operation and because pf->adev is set too early (plug) or too late
(unplug) the function ice_send_event_to_aux() can touch uninitialized
or destroyed fields. In the case of crash below pf->adev->dev.mutex.

Crash:
[ 53.372066] bond0: (slave ens7f0): making interface the new active one
[ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
link
[ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
idating tc mappings. Priority traffic classification disabled!
[ 54.248204] bond0: (slave ens7f0): Releasing backup interface
[ 54.253955] bond0: (slave ens7f1): making interface the new active one
[ 54.274875] bond0: (slave ens7f1): Releasing backup interface
[ 54.289153] bond0 (unregistering): Released all slaves
[ 55.383179] MII link monitoring set to 100 ms
[ 55.398696] bond0: (slave ens7f0): making interface the new active one
[ 55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
[ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
p link
[ 55.412198] #PF: supervisor write access in kernel mode
[ 55.412200] #PF: error_code(0x0002) - not-present page
[ 55.412201] PGD 25d2ad067 P4D 0
[ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
5.17.0-13579-g57f2d6540f03 #1
[ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
link
[ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
2021
[ 55.430226] Workqueue: ice ice_service_task [ice]
[ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20
[ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
[ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
[ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
[ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
[ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
[ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
[ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
[ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
[ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
[ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 55.567305] PKRU: 55555554
[ 55.570018] Call Trace:
[ 55.572474] <TASK>
[ 55.574579] ice_service_task+0xaab/0xef0 [ice]
[ 55.579130] process_one_work+0x1c5/0x390
[ 55.583141] ? process_one_work+0x390/0x390
[ 55.587326] worker_thread+0x30/0x360
[ 55.590994] ? process_one_work+0x390/0x390
[ 55.595180] kthread+0xe6/0x110
[ 55.598325] ? kthread_complete_and_exit+0x20/0x20
[ 55.603116] ret_from_fork+0x1f/0x30
[ 55.606698] </TASK>

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f9124c68 17-Mar-2022 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: synchronize_rcu() when terminating rings

Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.

This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5be757 ("net/i40e: Fix concurrency issues
between config flow and XSK")

Fixes: efc2214b6047 ("ice: Add support for XDP")
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Shwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1273f895 31-Mar-2022 Ivan Vecera <ivecera@redhat.com>

ice: Fix broken IFF_ALLMULTI handling

Handling of all-multicast flag and associated multicast promiscuous
mode is broken in ice driver. When an user switches allmulticast
flag on or off the driver checks whether any VLANs are configured
over the interface (except default VLAN 0).

If any extra VLANs are registered it enables multicast promiscuous
mode for all these VLANs (including default VLAN 0) using
ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
multicast packets tagged with known VLAN ID or untagged are received
and multicast packets tagged with unknown VLAN ID ignored.

If no extra VLANs are registered (so only VLAN 0 exists) it enables
multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
look-up type. In this situation any multicast packets including
tagged ones are received.

The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:

ice_vsi_sync_fltr() {
...
if (changed_flags & IFF_ALLMULTI) {
if (netdev->flags & IFF_ALLMULTI) {
if (vsi->num_vlans > 1)
ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
else
ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
} else {
if (vsi->num_vlans > 1)
ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
else
ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
}
}
...
}

The code above depends on value vsi->num_vlan that specifies number
of VLANs configured over the interface (including VLAN 0) and
this is problem because that value is modified in NDO callbacks
ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().

Scenario 1:
1. ip link set ens7f0 allmulticast on
2. ip link add vlan10 link ens7f0 type vlan id 10
3. ip link set ens7f0 allmulticast off
4. ip link set ens7f0 allmulticast on

[1] In this scenario IFF_ALLMULTI is enabled and the driver calls
ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
multicast promisc rule with non-VLAN look-up type.
[2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
[3] Command switches IFF_ALLMULTI off and the driver calls
ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
call is effectively NOP because it looks for multicast promisc
rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
rules exist. So the all-multicast remains enabled silently
in hardware.
[4] Command tries to switch IFF_ALLMULTI on and the driver calls
ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
fails (-EEXIST) because non-VLAN multicast promisc rule already
exists.

Scenario 2:
1. ip link add vlan10 link ens7f0 type vlan id 10
2. ip link set ens7f0 allmulticast on
3. ip link add vlan20 link ens7f0 type vlan id 20
4. ip link del vlan10 ; ip link del vlan20
5. ip link set ens7f0 allmulticast off

[1] VLAN with ID 10 is added and vsi->num_vlan==2
[2] Command switches IFF_ALLMULTI on and driver installs multicast
promisc rules with VLAN look-up type for VLAN 0 and 10
[3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
promisc rules is added for this new VLAN so the interface does
not receive MC packets from VLAN 20
[4] Both VLANs are removed but multicast rule for VLAN 10 remains
installed so interface receives multicast packets from VLAN 10
[5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
the driver tries to remove multicast promisc rule for VLAN 0
with non-VLAN look-up that does not exist.
All-multicast looks disabled from user point of view but it
is partially enabled in HW (interface receives all multicast
packets either untagged or tagged with VLAN ID 10)

To resolve these issues the patch introduces these changes:
1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
and IFF_ALLMULTI is enabled an appropriate multicast promisc
rule for that VLAN ID is added/removed.
2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
type for existing multicast promisc rule for VLAN 0 is updated
to ICE_MCAST_VLAN_PROMISC_BITS.
3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
type for existing multicast promisc rule for VLAN 0 is updated
to ICE_MCAST_PROMISC_BITS.
4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
bit protection to avoid races with ice_vsi_sync_fltr() that runs
in ice_service_task() context.
5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
to avoid them in their callers
7. Small improvements to increase readability

Fixes: 5eda8afd6bcc ("ice: Add support for PF/VF promiscuous mode")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1ac2524d 28-Mar-2022 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: xsk: Fix indexing in ice_tx_xsk_pool()

Ice driver tries to always create XDP rings array to be
num_possible_cpus() sized, regardless of user's queue count setting that
can be changed via ethtool -L for example.

Currently, ice_tx_xsk_pool() calculates the qid by decrementing the
ring->q_index by the count of XDP queues, but ring->q_index is set to 'i
+ vsi->alloc_txq'.

When user did ethtool -L $IFACE combined 1, alloc_txq is 1, but
vsi->num_xdp_txq is still num_possible_cpus(). Then, ice_tx_xsk_pool()
will do OOB access and in the final result ring would not get xsk_pool
pointer assigned. Then, each ice_xsk_wakeup() call will fail with error
and it will not be possible to get into NAPI and do the processing from
driver side.

Fix this by decrementing vsi->alloc_txq instead of vsi->num_xdp_txq from
ring-q_index in ice_tx_xsk_pool() so the calculation is reflected to the
setting of ring->q_index.

Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220328142123.170157-5-maciej.fijalkowski@intel.com


# 32d53c0a 23-Mar-2022 Alexander Lobakin <alexandr.lobakin@intel.com>

ice: fix 'scheduling while atomic' on aux critical err interrupt

There's a kernel BUG splat on processing aux critical error
interrupts in ice_misc_intr():

[ 2100.917085] BUG: scheduling while atomic: swapper/15/0/0x00010000
...
[ 2101.060770] Call Trace:
[ 2101.063229] <IRQ>
[ 2101.065252] dump_stack+0x41/0x60
[ 2101.068587] __schedule_bug.cold.100+0x4c/0x58
[ 2101.073060] __schedule+0x6a4/0x830
[ 2101.076570] schedule+0x35/0xa0
[ 2101.079727] schedule_preempt_disabled+0xa/0x10
[ 2101.084284] __mutex_lock.isra.7+0x310/0x420
[ 2101.088580] ? ice_misc_intr+0x201/0x2e0 [ice]
[ 2101.093078] ice_send_event_to_aux+0x25/0x70 [ice]
[ 2101.097921] ice_misc_intr+0x220/0x2e0 [ice]
[ 2101.102232] __handle_irq_event_percpu+0x40/0x180
[ 2101.106965] handle_irq_event_percpu+0x30/0x80
[ 2101.111434] handle_irq_event+0x36/0x53
[ 2101.115292] handle_edge_irq+0x82/0x190
[ 2101.119148] handle_irq+0x1c/0x30
[ 2101.122480] do_IRQ+0x49/0xd0
[ 2101.125465] common_interrupt+0xf/0xf
[ 2101.129146] </IRQ>
...

As Andrew correctly mentioned previously[0], the following call
ladder happens:

ice_misc_intr() <- hardirq
ice_send_event_to_aux()
device_lock()
mutex_lock()
might_sleep()
might_resched() <- oops

Add a new PF state bit which indicates that an aux critical error
occurred and serve it in ice_service_task() in process context.
The new ice_pf::oicr_err_reg is read-write in both hardirq and
process contexts, but only 3 bits of non-critical data probably
aren't worth explicit synchronizing (and they're even in the same
byte [31:24]).

[0] https://lore.kernel.org/all/YeSRUVmrdmlUXHDn@lunn.ch

Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Tested-by: Michal Kubiak <michal.kubiak@intel.com>
Acked-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 649c87c6 22-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: remove circular header dependencies on ice.h

Several headers in the ice driver include ice.h even though they are
themselves included by that header. The most notable of these is
ice_common.h, but several other headers also do this.

Such a recursive inclusion is problematic as it forces headers to be
included in a strict order, otherwise compilation errors can result. The
circular inclusions do not trigger an endless loop due to standard
header inclusion guards, however other errors can occur.

For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.

ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
creates a circular dependency.

The definition in ice_sriov.h requires things from ice_flow.h, but
ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
process for expanding ice.h. The current code avoids this issue by
having an implicit dependency without the include of ice_flow.h.

If we were to fix that so that ice_sriov.h explicitly depends on
ice_flow.h the following pattern would occur:

ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h

At this point, during the expansion of, the header guard for ice_flow.h
is already set, so when ice_sriov.h attempts to load the ice_flow.h
header it is skipped. Then, we go on to begin including the rest of
ice_sriov.h, including structure definitions which depend on
ice_rss_hash_cfg. This produces a compiler warning because
ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
start of ice_flow.h!

If the order of headers is incorrect (ice_flow.h is not implicitly
loaded first in all files which include ice_sriov.h) then we get the
same failure.

Removing this recursive inclusion requires fixing a few cases where some
headers depended on the header inclusions from ice.h. In addition, a few
other changes are also required.

Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
which is the likely reason that ice_common.h includes ice.h at all. This
macro implementation requires the full definition of ice_pf in order to
properly compile.

Fix this by moving it to a function declared in ice_main.c, so that we
do not require all files to depend on the layout of the ice_pf
structure.

Note that this change only fixes circular dependencies, but it does not
fully resolve all implicit dependencies where one header may depend on
the inclusion of another. I tried to fix as many of the implicit
dependencies as I noticed, but fixing them all requires a somewhat
tedious analysis of each header and attempting to compile it separately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0deb0bf7 22-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: rename ice_virtchnl_pf.c to ice_sriov.c

The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
code for implementing Single Root IOV virtualization resides. This code
includes support for bringing up and tearing down VFs, hooks into the
kernel SR-IOV netdev operations, and for handling virtchnl messages from
VFs.

In the future, we plan to support Scalable IOV in addition to Single
Root IOV as an alternative virtualization scheme. This implementation
will re-use some but not all of the code in ice_virtchnl_pf.c

To prepare for this future, we want to refactor and split up the code in
ice_virtchnl_pf.c into the following scheme:

* ice_vf_lib.[ch]

Basic VF structures and accessors. This is where scheme-independent
code will reside.

* ice_virtchnl.[ch]

Virtchnl message handling. This is where the bulk of the logic for
processing messages from VFs using the virtchnl messaging scheme will
reside. This is separated from ice_vf_lib.c because it is distinct
and has a bulk of the processing code.

* ice_sriov.[ch]

Single Root IOV implementation, including initialization and the
routines for interacting with SR-IOV based netdev operations.

* (future) ice_siov.[ch]

Scalable IOV implementation.

As a first step, lets assume that all of the code in
ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
ice_sriov.c and its header to ice_sriov.h

Future changes will further split out the code in these files following
the plan outlined here.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d775155a 22-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: rename ice_sriov.c to ice_vf_mbx.c

The ice_sriov.c file primarily contains code which handles the logic for
mailbox overflow detection and some other utility functions related to
the virtualization mailbox.

The bulk of the SR-IOV implementation is actually found in
ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.

In the future, the ice driver will support an additional virtualization
scheme known as Scalable IOV, and the code in this file will be used
for this alternative implementation.

Rename this file (and its associated header) to ice_vf_mbx.c, so that we
can later re-use the ice_sriov.c file as the SR-IOV specific file.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 9a225f81 04-Mar-2022 Marcin Szycik <marcin.szycik@linux.intel.com>

ice: Support GTP-U and GTP-C offload in switchdev

Add support for creating filters for GTP-U and GTP-C in switchdev mode. Add
support for parsing GTP-specific options (QFI and PDU type) and TEID.

By default, a filter for GTP-U will be added. To add a filter for GTP-C,
specify enc_dst_port = 2123, e.g.:

tc filter add dev $GTP0 ingress prio 1 flower enc_key_id 1337 \
enc_dst_port 2123 action mirred egress redirect dev $VF1_PR

Note: GTP-U with outer IPv6 offload is not supported yet.
Note: GTP-U with no payload offload is not supported yet.

Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c8ff29b5 27-Jan-2022 Marcin Szycik <marcin.szycik@linux.intel.com>

ice: Add slow path offload stats on port representor in switchdev

Implement callbacks to check for stats and fetch port representor stats.
Stats are taken from RX/TX ring corresponding to port representor and show
the number of bytes/packets that were not offloaded.

To see slow path stats run:
ifstat -x cpu_hits -a

Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 000773c0 16-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: factor VF variables to separate structure

We maintain a number of values for VFs within the ice_pf structure. This
includes the VF table, the number of allocated VFs, the maximum number
of supported SR-IOV VFs, the number of queue pairs per VF, the number of
MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.

We're about to add a few more variables to this list. Clean this up
first by extracting these members out into a new ice_vfs structure
defined in ice_virtchnl_pf.h

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# b03d519d 16-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: store VF pointer instead of VF ID

The VSI structure contains a vf_id field used to associate a VSI with a
VF. This is used mainly for ICE_VSI_VF as well as partially for
ICE_VSI_CTRL associated with the VFs.

This API was designed with the idea that VFs are stored in a simple
array that was expected to be static throughout most of the driver's
life.

We plan on refactoring VF storage in a few key ways:

1) converting from a simple static array to a hash table
2) using krefs to track VF references obtained from the hash table
3) use RCU to delay release of VF memory until after all references
are dropped

This is motivated by the goal to ensure that the lifetime of VF
structures is accounted for, and prevent various use-after-free bugs.

With the existing vsi->vf_id, the reference tracking for VFs would
become somewhat convoluted, because each VSI maintains a vf_id field
which will then require performing a look up. This means all these flows
will require reference tracking and proper usage of rcu_read_lock, etc.

We know that the VF VSI will always be backed by a valid VF structure,
because the VSI is created during VF initialization and removed before
the VF is destroyed. Rely on this and store a reference to the VF in the
VSI structure instead of storing a VF ID. This will simplify the usage
and avoid the need to perform lookups on the hash table in the future.

For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
debugging code which violates this assumption and avoid more disastrous
panics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 43113ff7 01-Mar-2022 Karol Kolacinski <karol.kolacinski@intel.com>

ice: add TTY for GNSS module for E810T device

Add a new ice_gnss.c file for holding the basic GNSS module functions.
If the device supports GNSS module, call the new ice_gnss_init and
ice_gnss_release functions where appropriate.

Implement basic functionality for reading the data from GNSS module
using TTY device.

Add I2C read AQ command. It is now required for controlling the external
physical connectors via external I2C port expander on E810-T adapters.

Future changes will introduce write functionality.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Sudhansu Sekhar Mishra <sudhansu.mishra@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 88f62aea 11-Feb-2022 Dave Ertman <david.m.ertman@intel.com>

ice: Simplify tracking status of RDMA support

The status of support for RDMA is currently being tracked with two
separate status flags. This is unnecessary with the current state of
the driver.

Simplify status tracking down to a single flag.

Rename the helper function to denote the RDMA specific status and
universally use the helper function to test the status bit.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# f1da5a08 02-Dec-2021 Brett Creeley <brett.creeley@intel.com>

ice: Add ability for PF admin to enable VF VLAN pruning

VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers
expect VFs to receive all VLAN tagged traffic with a matching
destination MAC.

Add an ethtool private flag 'vf-vlan-pruning' and set the default to
off so VFs will receive all VLAN traffic directed towards them. When
the flag is turned on, VF will only be able to receive untagged
traffic or traffic with VLAN tags it has created interfaces for.

Also, the flag cannot be changed while any VFs are allocated. This was
done to simplify the implementation. So, if this flag is needed, then
the PF admin must enable it. If the user tries to enable the flag while
VFs are active, then print an unsupported message with the
vf-vlan-pruning flag included. In case multiple flags were specified, this
makes it clear to the user which flag failed.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c31af68a 02-Dec-2021 Brett Creeley <brett.creeley@intel.com>

ice: Add outer_vlan_ops and VSI specific VLAN ops implementations

Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
ops are only available when the device is in Double VLAN Mode (DVM).
Depending on the VSI type, the requirements for what operations to
use/allow differ.

By default all VSI's have unsupported inner and outer VSI VLAN ops. This
implementation was chosen to prevent unexpected crashes due to null
pointer dereferences. Instead, if a VSI calls an unsupported op, it will
just return -EOPNOTSUPP.

Add implementations to support modifying outer VLAN fields for VSI
context. This includes the ability to modify VLAN stripping, insertion,
and the port VLAN based on the outer VLAN handling fields of the VSI
context.

These functions should only ever be used if DVM is enabled because that
means the firmware supports the outer VLAN fields in the VSI context. If
the device is in DVM, then always use the outer_vlan_ops, else use the
vlan_ops since the device is in Single VLAN Mode (SVM).

Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
ice_vsi_vlan_setup() as the latter function is specific to the PF and
all other VSI types that need an untagged VLAN 0 filter already do this
in their specific flows. Without this change, Flow Director is failing
to initialize because it does not implement any VSI VLAN ops.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# bc42afa9 02-Dec-2021 Brett Creeley <brett.creeley@intel.com>

ice: Add new VSI VLAN ops

Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
offloads require more flexibility when configuring VLANs. The VSI VLAN
interface will allow flexibility for configuring VLANs for all VSI
types. Add new files to separate the VSI VLAN ops and move functions to
make the code more organized.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 5cb1ebdb 10-Mar-2022 Ivan Vecera <ivecera@redhat.com>

ice: Fix race condition during interface enslave

Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating
auxiliary device") changes a process of re-creation of aux device
so ice_plug_aux_dev() is called from ice_service_task() context.
This unfortunately opens a race window that can result in dead-lock
when interface has left LAG and immediately enters LAG again.

Reproducer:
```
#!/bin/sh

ip link add lag0 type bond mode 1 miimon 100
ip link set lag0

for n in {1..10}; do
echo Cycle: $n
ip link set ens7f0 master lag0
sleep 1
ip link set ens7f0 nomaster
done
```

This results in:
[20976.208697] Workqueue: ice ice_service_task [ice]
[20976.213422] Call Trace:
[20976.215871] __schedule+0x2d1/0x830
[20976.219364] schedule+0x35/0xa0
[20976.222510] schedule_preempt_disabled+0xa/0x10
[20976.227043] __mutex_lock.isra.7+0x310/0x420
[20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
[20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
[20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core]
[20976.261079] ib_register_device+0x40d/0x580 [ib_core]
[20976.266139] irdma_ib_register_device+0x129/0x250 [irdma]
[20976.281409] irdma_probe+0x2c1/0x360 [irdma]
[20976.285691] auxiliary_bus_probe+0x45/0x70
[20976.289790] really_probe+0x1f2/0x480
[20976.298509] driver_probe_device+0x49/0xc0
[20976.302609] bus_for_each_drv+0x79/0xc0
[20976.306448] __device_attach+0xdc/0x160
[20976.310286] bus_probe_device+0x9d/0xb0
[20976.314128] device_add+0x43c/0x890
[20976.321287] __auxiliary_device_add+0x43/0x60
[20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice]
[20976.330109] ice_service_task+0xd0c/0xed0 [ice]
[20976.342591] process_one_work+0x1a7/0x360
[20976.350536] worker_thread+0x30/0x390
[20976.358128] kthread+0x10a/0x120
[20976.365547] ret_from_fork+0x1f/0x40
...
[20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084
[20976.446469] Call Trace:
[20976.448921] __schedule+0x2d1/0x830
[20976.452414] schedule+0x35/0xa0
[20976.455559] schedule_preempt_disabled+0xa/0x10
[20976.460090] __mutex_lock.isra.7+0x310/0x420
[20976.464364] device_del+0x36/0x3c0
[20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice]
[20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice]
[20976.477288] notifier_call_chain+0x47/0x70
[20976.481386] __netdev_upper_dev_link+0x18b/0x280
[20976.489845] bond_enslave+0xe05/0x1790 [bonding]
[20976.494475] do_setlink+0x336/0xf50
[20976.502517] __rtnl_newlink+0x529/0x8b0
[20976.543441] rtnl_newlink+0x43/0x60
[20976.546934] rtnetlink_rcv_msg+0x2b1/0x360
[20976.559238] netlink_rcv_skb+0x4c/0x120
[20976.563079] netlink_unicast+0x196/0x230
[20976.567005] netlink_sendmsg+0x204/0x3d0
[20976.570930] sock_sendmsg+0x4c/0x50
[20976.574423] ____sys_sendmsg+0x1eb/0x250
[20976.586807] ___sys_sendmsg+0x7c/0xc0
[20976.606353] __sys_sendmsg+0x57/0xa0
[20976.609930] do_syscall_64+0x5b/0x1a0
[20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca

1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
is called from ice_service_task() context, aux device is created
and associated device->lock is taken.
2. Command 'ip link ... set master...' calls ice's notifier under
RTNL lock and that notifier calls ice_unplug_aux_dev(). That
function tries to take aux device->lock but this is already taken
by ice_plug_aux_dev() in step 1
3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
taken in step 2
4. Dead-lock

The patch fixes this issue by following changes:
- Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
call in ice_service_task()
- The bit is checked in ice_clear_rdma_cap() and only if it is not set
then ice_unplug_aux_dev() is called. If it is set (in other words
plugging of aux device was requested and ice_plug_aux_dev() is
potentially running) then the function only clears the bit
- Once ice_plug_aux_dev() call (in ice_service_task) is finished
the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
whether it was already cleared by ice_clear_rdma_cap(). If so then
aux device is unplugged.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Co-developed-by: Petr Oros <poros@redhat.com>
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Dave Ertman <david.m.ertman@intel.com>
Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# 97b01291 18-Feb-2022 Dave Ertman <david.m.ertman@intel.com>

ice: Fix error with handling of bonding MTU

When a bonded interface is destroyed, .ndo_change_mtu can be called
during the tear-down process while the RTNL lock is held. This is a
problem since the auxiliary driver linked to the LAN driver needs to be
notified of the MTU change, and this requires grabbing a device_lock on
the auxiliary_device's dev. Currently this is being attempted in the
same execution context as the call to .ndo_change_mtu which is causing a
dead-lock.

Move the notification of the changed MTU to a separate execution context
(watchdog service task) and eliminate the "before" notification.

Fixes: 348048e724a0e ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# fadead80 07-Feb-2022 Jacob Keller <jacob.e.keller@intel.com>

ice: fix concurrent reset and removal of VFs

Commit c503e63200c6 ("ice: Stop processing VF messages during teardown")
introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
intended to prevent some issues with concurrently handling messages from
VFs while tearing down the VFs.

This change was motivated by crashes caused while tearing down and
bringing up VFs in rapid succession.

It turns out that the fix actually introduces issues with the VF driver
caused because the PF no longer responds to any messages sent by the VF
during its .remove routine. This results in the VF potentially removing
its DMA memory before the PF has shut down the device queues.

Additionally, the fix doesn't actually resolve concurrency issues within
the ice driver. It is possible for a VF to initiate a reset just prior
to the ice driver removing VFs. This can result in the remove task
concurrently operating while the VF is being reset. This results in
similar memory corruption and panics purportedly fixed by that commit.

Fix this concurrency at its root by protecting both the reset and
removal flows using the existing VF cfg_lock. This ensures that we
cannot remove the VF while any outstanding critical tasks such as a
virtchnl message or a reset are occurring.

This locking change also fixes the root cause originally fixed by commit
c503e63200c6 ("ice: Stop processing VF messages during teardown"), so we
can simply revert it.

Note that I kept these two changes together because simply reverting the
original commit alone would leave the driver vulnerable to worse race
conditions.

Fixes: c503e63200c6 ("ice: Stop processing VF messages during teardown")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 5dbbbd01 20-Jan-2022 Dave Ertman <david.m.ertman@intel.com>

ice: Avoid RTNL lock when re-creating auxiliary device

If a call to re-create the auxiliary device happens in a context that has
already taken the RTNL lock, then the call flow that recreates auxiliary
device can hang if there is another attempt to claim the RTNL lock by the
auxiliary driver.

To avoid this, any call to re-create auxiliary devices that comes from
an source that is holding the RTNL lock (e.g. netdev notifier when
interface exits a bond) should execute in a separate thread. To
accomplish this, add a flag to the PF that will be evaluated in the
service task and dealt with there.

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reviewed-by: Jonathan Toppins <jtoppins@redhat.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 40319796 29-Dec-2021 Kiran Patil <kiran.patil@intel.com>

ice: Add flow director support for channel mode

Add support to enable flow-director filter when multiple TCs are
configured. Flow director filter can be configured using ethtool
(--config-ntuple option). When multiple TCs are configured, each
TC is mapped to an unique HW VSI. So VSI corresponding to queue
used in filter is identified and flow director context is updated
with correct VSI while configuring ntuple filter in HW.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 399e27db 27-Oct-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: support immediate firmware activation via devlink reload

The ice hardware contains an embedded chip with firmware which can be
updated using devlink flash. The firmware which runs on this chip is
referred to as the Embedded Management Processor firmware (EMP
firmware).

Activating the new firmware image currently requires that the system be
rebooted. This is not ideal as rebooting the system can cause unwanted
downtime.

In practical terms, activating the firmware does not always require a
full system reboot. In many cases it is possible to activate the EMP
firmware immediately. There are a couple of different scenarios to
cover.

* The EMP firmware itself can be reloaded by issuing a special update
to the device called an Embedded Management Processor reset (EMP
reset). This reset causes the device to reset and reload the EMP
firmware.

* PCI configuration changes are only reloaded after a cold PCIe reset.
Unfortunately there is no generic way to trigger this for a PCIe
device without a system reboot.

When performing a flash update, firmware is capable of responding with
some information about the specific update requirements.

The driver updates the flash by programming a secondary inactive bank
with the contents of the new image, and then issuing a command to
request to switch the active bank starting from the next load.

The response to the final command for updating the inactive NVM flash
bank includes an indication of the minimum reset required to fully
update the device. This can be one of the following:

* A full power on is required
* A cold PCIe reset is required
* An EMP reset is required

The response to the command to switch flash banks includes an indication
of whether or not the firmware will allow an EMP reset request.

For most updates, an EMP reset is sufficient to load the new EMP
firmware without issues. In some cases, this reset is not sufficient
because the PCI configuration space has changed. When this could cause
incompatibility with the new EMP image, the firmware is capable of
rejecting the EMP reset request.

Add logic to ice_fw_update.c to handle the response data flash update
AdminQ commands.

For the reset level, issue a devlink status notification informing the
user of how to complete the update with a simple suggestion like
"Activate new firmware by rebooting the system".

Cache the status of whether or not firmware will restrict the EMP reset
for use in implementing devlink reload.

Implement support for devlink reload with the "fw_activate" flag. This
allows user space to request the firmware be activated immediately.

For the .reload_down handler, we will issue a request for the EMP reset
using the appropriate firmware AdminQ command. If we know that the
firmware will not allow an EMP reset, simply exit with a suitable
netlink extended ACK message indicating that the EMP reset is not
available.

For the .reload_up handler, simply wait until the driver has finished
resetting. Logic to handle processing of an EMP reset already exists in
the driver as part of its reset and rebuild flows.

Implement support for the devlink reload interface with the
"fw_activate" action. This allows userspace to request activation of
firmware without a reboot.

Note that support for indicating the required reset and EMP reset
restriction is not supported on old versions of firmware. The driver can
determine if the two features are supported by checking the device
capabilities report. I confirmed support has existed since at least
version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
EMP reset request has existed in all version of the EMP firmware for the
ice hardware.

Check the device capabilities report to determine whether or not the
indications are reported by the running firmware. If the reset
requirement indication is not supported, always assume a full power on
is necessary. If the reset restriction capability is not supported,
always assume the EMP reset is available.

Users can verify if the EMP reset has activated the firmware by using
the devlink info report to check that the 'running' firmware version has
updated. For example a user might do the following:

# Check current version
$ devlink dev info

# Update the device
$ devlink dev flash pci/0000:af:00.0 file firmware.bin

# Confirm stored version updated
$ devlink dev info

# Reload to activate new firmware
$ devlink dev reload pci/0000:af:00.0 action fw_activate

# Confirm running version updated
$ devlink dev info

Finally, this change does *not* implement basic driver-only reload
support. I did look into trying to do this. However, it requires
significant refactor of how the ice driver probes and loads everything.
The ice driver probe and allocation flows were not designed with such
a reload in mind. Refactoring the flow to support this is beyond the
scope of this change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 78ad87da 11-Oct-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: devlink: add shadow-ram region to snapshot Shadow RAM

We have a region for reading the contents of the NVM flash as
a snapshot. This region does not allow reading the Shadow RAM, as it
always passes the FLASH_ONLY bit to the low level firmware interface.

Add a separate shadow-ram region which will allow snapshot of the
current contents of the Shadow RAM. This data is built from the NVM
contents but is distinct as the device builds up the Shadow RAM during
initialization, so being able to snapshot its contents can be useful
when attempting to debug flash related issues.

Fix the comment description of the nvm-flash region which incorrectly
stated that it filled the shadow-ram region, and add a comment
explaining that the nvm-flash region does not actually read the Shadow
RAM.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 5f87ec48 07-Oct-2021 Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Remove string printing for ice_status

Remove the ice_stat_str() function which prints the string
representation of the ice_status error code. With upcoming changes
moving away from ice_status, there will be no need for this function.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>


# e523af4e 18-Oct-2021 Shiraz Saleem <shiraz.saleem@intel.com>

net/ice: Add support for enable_iwarp and enable_roce devlink param

Allow support for 'enable_iwarp' and 'enable_roce' devlink params to turn
on/off iWARP or RoCE protocol support for E800 devices.

For example, a user can turn on iWARP functionality with,

devlink dev param set pci/0000:07:00.0 name enable_iwarp value true cmode runtime

This add an iWARP auxiliary rdma device, ice.iwarp.<>, under this PF.

A user request to enable both iWARP and RoCE under the same PF is rejected
since this device does not support both protocols simultaneously on the
same port.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1a8c7778 26-Feb-2021 Brett Creeley <brett.creeley@intel.com>

ice: Fix VF true promiscuous mode

When a VF requests promiscuous mode and it's trusted and true promiscuous
mode is enabled the PF driver attempts to enable unicast and/or
multicast promiscuous mode filters based on the request. This is fine,
but there are a couple issues with the current code.

[1] The define to configure the unicast promiscuous mode mask also
includes bits to configure the multicast promiscuous mode mask, which
causes multicast to be set/cleared unintentionally.
[2] All 4 cases for enable/disable unicast/multicast mode are not
handled in the promiscuous mode message handler, which causes
unexpected results regarding the current promiscuous mode settings.

To fix [1] make sure any promiscuous mask defines include the correct
bits for each of the promiscuous modes.

To fix [2] make sure that all 4 cases are handled since there are 2 bits
(FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be
either set or cleared. Also, since either unicast and/or multicast
promiscuous configuration can fail, introduce two separate error values
to handle each of these cases.

Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 99d40752 13-Oct-2021 Brett Creeley <brett.creeley@intel.com>

ice: Add support to print error on PHY FW load failure

Some devices have support for loading the PHY FW and in some cases this
can fail. When this fails, the FW will set the corresponding bit in the
link info structure. Also, the FW will send a link event if the correct
link event mask bit is set. Add support for printing an error message
when the PHY FW load fails during any link configuration flow and the
link event flow.

Since ice_check_module_power() is already doing something very similar
add a new function ice_check_link_cfg_err() so any failures reported in
the link info's link_cfg_err member can be printed in this one function.

Also, add the new ICE_FLAG_PHY_FW_LOAD_FAILED bit to the PF's flags so
we don't constantly print this error message during link polling if the
value never changed.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 195bb48f 12-Oct-2021 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: support for indirect notification

Implement indirect notification mechanism to support offloading TC rules
on tunnel devices.

Keep indirect block list in netdev priv. Notification will call setting
tc cls flower function. For now we can offload only ingress type. Return
not supported for other flow block binder.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 9fea7498 15-Oct-2021 Kiran Patil <kiran.patil@intel.com>

ice: Add tc-flower filter support for channel

Add support to add/delete channel specific filter using tc-flower.
For now, only supported action is "skip_sw hw_tc <tc_num>"

Filter criteria is specific to channel and it can be
combination of L3, L3+L4, L2+L4.

Example:
MATCH criteria Action
---------------------------
src and/or dest IPv4[6]/mask -> Forward to "hw_tc <tc_num>"
dest IPv4[6]/mask + dest L4 port -> Forward to "hw_tc <tc_num>"
dest MAC + dest L4 port -> Forward to "hw_tc <tc_num>"
src IPv4[6]/mask + src L4 port -> Forward to "hw_tc <tc_num>"
src MAC + src L4 port -> Forward to "hw_tc <tc_num>"

Adding tc-flower filter for channel using "hw_tc"
-------------------------------------------------
tc qdisc add dev <ethX> clsact

Above two steps are only needed the first time when adding
tc-flower filter.

tc filter add dev <ethX> protocol ip ingress prio 1 flower \
dst_ip 192.168.0.1/32 ip_proto tcp dst_port 5001 \
skip_sw hw_tc 1

tc filter show dev <ethX> ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1 hw_tc 1
eth_type ipv4
ip_proto tcp
dst_ip 192.168.0.1
dst_port 5001
skip_sw
in_hw in_hw_count 1

Delete specific filter:
-------------------------
tc filter del dev <ethx> ingress pref 1 handle 0x1 flower

Delete All filters:
------------------
tc filter del dev <ethX> ingress

Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# fbc7b27a 15-Oct-2021 Kiran Patil <kiran.patil@intel.com>

ice: enable ndo_setup_tc support for mqprio_qdisc

Add support in driver for TC_QDISC_SETUP_MQPRIO. This support
enables instantiation of channels in HW using existing MQPRIO
infrastructure which is extended to be offloadable. This
provides a mechanism to configure dedicated set of queues for
each TC.

Configuring channels using "tc mqprio":
--------------------------------------
tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \
queues 4@0 4@4 4@8 hw 1 mode channel

Above command configures 3 TCs having 4 queues each. "hw 1 mode channel"
implies offload of channel configuration to HW. When driver processes
configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each
TC maps to HW VSI with specified queues.

User can optionally specify bandwidth min and max rate limit per TC
(see example below). If shaper params like min and/or max bandwidth
rate limit are specified, driver configures VSI specific rate limiter
in HW.

Configuring channels and bandwidth shaper parameters using "tc mqprio":
----------------------------------------------------------------
tc qdisc add dev <ethX> root mqprio \
num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \
shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \
max_rate 4Gbit 5Gbit 6Gbit 7Gbit

Command to view configured TCs:
-----------------------------
tc qdisc show dev <ethX>

Deleting TCs:
------------
tc qdisc del dev <ethX> root mqprio

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0754d65b 15-Oct-2021 Kiran Patil <kiran.patil@intel.com>

ice: Add infrastructure for mqprio support via ndo_setup_tc

Add infrastructure required for "ndo_setup_tc:qdisc_mqprio".
ice_vsi_setup is modified to configure traffic classes based
on mqprio data received from the stack. This includes low-level
functions to configure min, max rate-limit parameters in hardware
for traffic classes. Each traffic class gets mapped to a hardware
channel (VSI) which can be individually configured with different
bandwidth parameters.

Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 2faf63b6 19-Aug-2021 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: make use of ice_for_each_* macros

Go through the code base and use ice_for_each_* macros. While at it,
introduce ice_for_each_xdp_txq() macro that can be used for looping over
xdp_rings array.

Commit is not introducing any new functionality.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 22bf877e 19-Aug-2021 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: introduce XDP_TX fallback path

Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.

Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.

Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e72bba21 19-Aug-2021 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: split ice_ring onto Tx/Rx separate structs

While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.

Rx ring could be handled by the old ice_ring which would reduce the code
churn within this patch, but this would make things asymmetric.

Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.

Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.

Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.

Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 325b2064 17-Aug-2021 Maciej Machnikowski <maciej.machnikowski@intel.com>

ice: Implement support for SMA and U.FL on E810-T

Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and
allow controlling them.

E810-T adapters are equipped with:
- 2 external bidirectional SMA connectors
- 1 internal TX U.FL
- 1 internal RX U.FL

U.FL connectors share signal lines with the SMA connectors. The TX U.FL1
share the line with the SMA1 and the RX U.FL2 share line with the SMA2.
This dependence is controlled by the ice_verify_pin_e810t.

Additionally add support for the E810-T-based devices which don't use the
SMA/U.FL controller. If the IO expander is not detected don't expose pins
and use 2 predefined 1PPS input and output pins.

Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0d08a441 06-Aug-2021 Kiran Patil <kiran.patil@intel.com>

ice: ndo_setup_tc implementation for PF

Implement ndo_setup_tc net device callback for TC HW offload on PF device.

ndo_setup_tc provides support for HW offloading various TC filters.
Add support for configuring the following filter with tc-flower:
- default L2 filters (src/dst mac addresses, ethertype, VLAN)
- variations of L3, L3+L4, L2+L3+L4 filters using advanced filters
(including ipv4 and ipv6 addresses).

Allow for adding/removing TC flows when PF device is configured in
eswitch switchdev mode. Two types of actions are supported at the
moment: FLOW_ACTION_DROP and FLOW_ACTION_REDIRECT.

Co-developed-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
Signed-off-by: Priyalee Kushwaha <priyalee.kushwaha@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 7aae80ce 19-Aug-2021 Wojciech Drewek <wojciech.drewek@intel.com>

ice: add port representor ethtool ops and stats

Introduce the following ethtool operations for VF's representor:
-get_drvinfo
-get_strings
-get_ethtool_stats
-get_sset_count
-get_link

In all cases, existing operations were used with minor
changes which allow us to detect if ethtool op was called for
representor. Only VF VSI stats will be available for representor.

Implement ndo_get_stats64 for port representor. This will update
VF VSI stats and read them.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f66756e0 19-Aug-2021 Grzegorz Nitka <grzegorz.nitka@intel.com>

ice: introduce new type of VSI for switchdev

New type of VSI has to be defined for switchdev control plane
VSI. Number of allocated Tx and Rx queue has to be equal to
amount of VFs, because each port representor should have one
Tx and Rx queue.

Also to not increase number of used irqs too much, control plane
VSI uses only one q_vector and handle all queues in one irq.
To allow handling all queues in one irq , new function to clean
msix for eswitch was introduced. This function will schedule napi
for each representor instead of scheduling it only for one like in
normal clean irq function.

Only one additional msix has to be requested. Always try to request
it in ice_ena_msix_range function.

Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1a1c40df 19-Aug-2021 Grzegorz Nitka <grzegorz.nitka@intel.com>

ice: set and release switchdev environment

Switchdev environment has to be set up when user create VFs
and eswitch mode is switchdev. Release is done when user
delete all VFs.

Data path in this implementation is based on control plane VSI.
This VSI is used to pass traffic from port representors to
corresponding VFs and vice versa. Default TX rule has to be
added to forward packet to control plane VSI. This will redirect
packets from VFs which don't match other rules to control plane
VSI.

On RX side default rule is added on uplink VSI to receive all
traffic that doesn't match other rules. When setting switchdev
environment all other rules from VFs should be removed. Packet to
VFs will be forwarded by control plane VSI.

As VF without any mac rules can't send any packet because of
antispoof mechanism, VSI antispoof should be turned off on each VFs.

To send packet from representor to correct VSI, destination VSI
field in TX descriptor will have to be filled. Allow that by
setting destination override bit in control plane VSI security config.

Packet from VFs will be received on control plane VSI. Driver
should decide to which netdev forward the packet. Decision is
made based on src_vsi field from descriptor. There is a target
netdev list in control plane VSI struct which choose netdev
based on src_vsi number.

Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 37165e3f 19-Aug-2021 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: introduce VF port representor

Port representor is used to manage VF from host side. To allow
it each created representor registers netdevice with random hw
address. Also devlink port is created for all representors.

Port representor name is created based on switch id or managed
by devlink core if devlink port was registered with success.

Open and stop ndo ops are implemented to allow managing the VF
link state. Link state is tracked in VF struct.

Struct ice_netdev_priv is extended by pointer to representor
field. This is needed to get correct representor from netdev
struct mostly used in ndo calls.

Implement helper functions to check if given netdev is netdev of
port representor (ice_is_port_repr_netdev) and to get representor
from netdev (ice_netdev_to_repr).

As driver mostly will create or destroy port representors on all
VFs instead of on single one, write functions to add and remove
representor for each VF.

Representor struct contains pointer to source VSI, which is VSI
configured on VF, backpointer to VF, backpointer to netdev,
q_vector pointer and metadata_dst which will be used in data path.

Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 2ae0aa47 19-Aug-2021 Wojciech Drewek <wojciech.drewek@intel.com>

ice: Move devlink port to PF/VF struct

Keeping devlink port inside VSI data structure causes some issues.
Since VF VSI is released during reset that means that we have to
unregister devlink port and register it again every time reset is
triggered. With the new changes in devlink API it
might cause deadlock issues. After calling
devlink_port_register/devlink_port_unregister devlink API is going to
lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
operation context (like setting VF MAC address or VLAN),
because rtnl_lock is already taken by netlink. Another call of
rtnl_lock from devlink API results in dead-lock.

By moving devlink port to PF/VF we avoid creating/destroying it
during reset. Since this patch, devlink ports are created during
ice_probe, destroyed during ice_remove for PF and created during
ice_repr_add, destroyed during ice_repr_rem for VF.

Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 3ea9bd5d 19-Aug-2021 Michal Swiatkowski <michal.swiatkowski@linux.intel.com>

ice: support basic E-Switch mode control

Write set and get eswitch mode functions used by devlink
ops. Use new pf struct member eswitch_mode to track current
eswitch mode in driver.

Changing eswitch mode is only allowed when there are no
VFs created.

Create new file for eswitch related code.

Add config flag ICE_SWITCHDEV to allow user to choose if
switchdev support should be enabled or disabled.

Use case examples:
- show current eswitch mode ('legacy' is the default one)
[root@localhost]# devlink dev eswitch show pci/0000:03:00.1
pci/0000:03:00.1: mode legacy

- move to 'switchdev' mode
[root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode
switchdev
[root@localhost]# devlink dev eswitch show pci/0000:03:00.1
pci/0000:03:00.1: mode switchdev

- create 2 VFs
[root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs

- unsuccessful attempt to change eswitch mode while VFs are created
[root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
devlink answers: Operation not supported

- destroy VFs
[root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs

- restore 'legacy' mode
[root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
[root@localhost]# devlink dev eswitch show pci/0000:03:00.1
pci/0000:03:00.1: mode legacy

Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 40b24760 16-Jul-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add feature bitmap, helpers and a check for DSCP

DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this,
this patch introduces a bitmap of features and helper functions.

The feature bitmap is set based on device IDs on driver init. Currently,
DSCP is the only feature in this bitmap, but there will be more in the
future. In the DCB netlink flow, check if the feature bit is set before
exercising DSCP.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# bfe84435 09-Sep-2021 Dave Ertman <david.m.ertman@intel.com>

ice: Correctly deal with PFs that do not support RDMA

There are two cases where the current PF does not support RDMA
functionality. The first is if the NVM loaded on the device is set
to not support RDMA (common_caps.rdma is false). The second is if
the kernel bonding driver has included the current PF in an active
link aggregate.

When the driver has determined that this PF does not support RDMA, then
auxiliary devices should not be created on the auxiliary bus. Without
a device on the auxiliary bus, even if the irdma driver is present, there
will be no RDMA activity attempted on this PF.

Currently, in the reset flow, an attempt to create auxiliary devices is
performed without regard to the ability of the PF. There needs to be a
check in ice_aux_plug_dev (as the central point that creates auxiliary
devices) to see if the PF is in a state to support the functionality.

When disabling and re-enabling RDMA due to the inclusion/removal of the PF
in a link aggregate, we also need to set/clear the bit which controls
auxiliary device creation so that a reset recovery in a link aggregate
situation doesn't try to create auxiliary devices when it shouldn't.

Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA")
Reported-by: Yongxin Liu <yongxin.liu@windriver.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# c503e632 04-Aug-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Stop processing VF messages during teardown

When VFs are setup and torn down in quick succession, it is possible
that a VF is torn down by the PF while the VF's virtchnl requests are
still in the PF's mailbox ring. Processing the VF's virtchnl request
when the VF itself doesn't exist results in undefined behavior. Fix
this by adding a check to stop processing virtchnl requests when VF
teardown is in progress.

Fixes: ddf30f7ff840 ("ice: Add handler to configure SR-IOV")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 06c16d89 09-Jun-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: register 1588 PTP clock device object for E810 devices

Add a new ice_ptp.c file for holding the basic PTP clock interface
functions. If the device supports PTP, call the new ice_ptp_init and
ice_ptp_release functions where appropriate.

If the function owns the hardware resource associated with the PTP
hardware clock, register with the PTP_1588_CLOCK infrastructure to
allocate a new clock object that represents the device hardware clock.

Implement basic functionality for reading and setting the clock time,
performing clock adjustments, and adjusting the clock frequency.

Future changes will introduce functionality for handling related
features including Tx and Rx timestamps.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 8f5ee3c4 09-Jun-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: add support for sideband messages

In order to support certain device features, including enabling the PTP
hardware clock, the ice driver needs to control some registers on the
device PHY.

These registers are accessed by sending sideband messages. For some
hardware, these messages must be sent over the device admin queue, while
other hardware has a dedicated control queue for the sideband messages.

Add the neighbor device message structure for sending a message to the
neighboring device. Where supported, initialize the sideband control
queue and handle cleanup.

Add a wrapper function for sending sideband control queue messages that
read or write a neighboring device register.

Because some devices send sideband messages over the AdminQ, also
increase the length of the admin queue to allow more messages to be
queued up. This is important because the sideband messages add
additional pressure on the AQ usage.

This support will be used in following patches to enable support for
CONFIG_1588_PTP_CLOCK.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c77849f5 06-May-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Detect and report unsupported module power levels

Determine whether an unsupported power configuration is preventing link
establishment by storing and checking the link_cfg_err_byte. Print error
messages when module power levels are unsupported. Also add a new flag
bit to prevent spamming said error messages.

Co-developed-by: Jeb Cramer <jeb.j.cramer@intel.com>
Signed-off-by: Jeb Cramer <jeb.j.cramer@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1c08052e 06-May-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: wait for reset before reporting devlink info

Requesting device firmware information while the device is busy cleaning
up after a reset can result in an unexpected failure:

This occurs because the command is attempting to access the device
AdminQ while it is down. Resolve this by having the command wait for
a while until the reset is complete. To do this, introduce
a reset_wait_queue and associated helper function "ice_wait_for_reset".

This helper will use the wait queue to sleep until the driver is done
rebuilding. Use of a wait queue is preferred because the potential sleep
duration can be several seconds.

To ensure that the thread wakes up properly, a new wake_up call is added
during all code paths which clear the reset state bits associated with
the driver rebuild flow.

Using this ensures that tools can request device information without
worrying about whether the driver is cleaning up from a reset.
Specifically, it is expected that a flash update could result in
a device reset, and it is better to delay the response for information
until the reset is complete rather than exit with an immediate failure.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f9f5301e 20-May-2021 Dave Ertman <david.m.ertman@intel.com>

ice: Register auxiliary device to provide RDMA

Register ice client auxiliary RDMA device on the auxiliary bus per
PCIe device function for the auxiliary driver (irdma) to attach to.
It allows to realize a single RDMA driver (irdma) capable of working with
multiple netdev drivers over multi-generation Intel HW supporting RDMA.
There is no load ordering dependencies between ice and irdma.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 348048e7 20-May-2021 Dave Ertman <david.m.ertman@intel.com>

ice: Implement iidc operations

Add implementations for supporting iidc operations for device operation
such as allocation of resources and event notifications.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d25a0fc4 20-May-2021 Dave Ertman <david.m.ertman@intel.com>

ice: Initialize RDMA support

Probe the device's capabilities to see if it supports RDMA. If so, allocate
and reserve resources to support its operation; populate structures with
initial values.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e102db78 27-Apr-2021 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: track AF_XDP ZC enabled queues in bitmap

Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure")
silently introduced a regression and broke the Tx side of AF_XDP in copy
mode. xsk_pool on ice_ring is set only based on the existence of the XDP
prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
That is not something that should happen for copy mode as it should use
the regular data path ice_clean_tx_irq.

This results in a following splat when xdpsock is run in txonly or l2fwd
scenarios in copy mode:

<snip>
[ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 106.057269] #PF: supervisor read access in kernel mode
[ 106.062493] #PF: error_code(0x0000) - not-present page
[ 106.067709] PGD 0 P4D 0
[ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
[ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
[ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
[ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
[ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
[ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
[ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
[ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
[ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
[ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
[ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
[ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 106.192898] PKRU: 55555554
[ 106.195653] Call Trace:
[ 106.198143] <IRQ>
[ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
[ 106.205087] ice_napi_poll+0x3e/0x590 [ice]
[ 106.209356] __napi_poll+0x2a/0x160
[ 106.212911] net_rx_action+0xd6/0x200
[ 106.216634] __do_softirq+0xbf/0x29b
[ 106.220274] irq_exit_rcu+0x88/0xc0
[ 106.223819] common_interrupt+0x7b/0xa0
[ 106.227719] </IRQ>
[ 106.229857] asm_common_interrupt+0x1e/0x40
</snip>

Fix this by introducing the bitmap of queues that are zero-copy enabled,
where each bit, corresponding to a queue id that xsk pool is being
configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
checked within ice_xsk_pool(). The latter is a function used for
deciding which napi poll routine is executed.
Idea is being taken from our other drivers such as i40e and ixgbe.

Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0891c896 02-Mar-2021 Vignesh Sridhar <vignesh.sridhar@intel.com>

ice: warn about potentially malicious VFs

Attempt to detect malicious VFs and, if suspected, log the information but
keep going to allow the user to take any desired actions.

Potentially malicious VFs are identified by checking if the VFs are
transmitting too many messages via the PF-VF mailbox which could cause an
overflow of this channel resulting in denial of service. This is done by
creating a snapshot or static capture of the mailbox buffer which can be
traversed and in which the messages sent by VFs are tracked.

Co-developed-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram <yashaswini.raghuram.prathivadi.bhayankaram@intel.com>
Co-developed-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# cdf1f1f1 31-Mar-2021 Jacob Keller <jacob.e.keller@intel.com>

ice: replace custom AIM algorithm with kernel's DIM library

The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.

The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.

The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.

The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).

There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.

The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.

Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# a476d72a 02-Mar-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add new VSI states to track netdev alloc/registration

Add two new VSI states, one to track if a netdev for the VSI has been
allocated and the other to track if the netdev has been registered.
Call unregister_netdev/free_netdev only when the corresponding state
bits are set.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 7e408e07 02-Mar-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Drop leading underscores in enum ice_pf_state

Remove the leading underscores in enum ice_pf_state. This is not really
communicating anything and is unnecessary. No functional change.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d41f26b5 02-Mar-2021 Bruce Allan <bruce.w.allan@intel.com>

ice: use kernel definitions for IANA protocol ports and ether-types

The well-known IANA protocol port 3260 (iscsi-target 0x0cbc) and the
ether-types 0x8906 (ETH_P_FCOE) and 0x8914 (ETH_P_FIP) are already defined
in kernel header files. Use those definitions instead of open-coding the
same.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 51fe27e1 25-Mar-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Remove rx_gro_dropped stat

Tracking of the rx_gro_dropped statistic was removed in
commit f73fc40327c0 ("ice: drop dead code in ice_receive_skb()").
Remove the associated variables and its reporting to ethtool stats.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e97fb1ae 02-Mar-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Consolidate VSI state and flags

struct ice_vsi has two fields, state and flags which seem to
be serving the same purpose. Consolidate them into one field
'state'.

enum ice_state is used to represent state information of the PF.
While some of these enum values can be use to represent VSI state,
it makes more sense to represent VSI state with its own enum. So
derive a new enum ice_vsi_state from ice_vsi_flags and ice_state
and use it. Also rename enum ice_state to ice_pf_state for clarity.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# b66a972a 02-Mar-2021 Brett Creeley <brett.creeley@intel.com>

ice: Refactor ice_set/get_rss into LUT and key specific functions

Currently ice_set/get_rss are used to set/get the RSS LUT and/or RSS
key. However nearly everywhere these functions are called only the LUT
or key are set/get. Also, making this change reduces how many things
ice_set/get_rss are doing. Fix this by adding ice_set/get_rss_lut and
ice_set/get_rss_key functions.

Also, consolidate all calls for setting/getting the RSS LUT and RSS Key
to use ice_set/get_rss_lut() and ice_set/get_rss_key().

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 8134d5ff 02-Mar-2021 Brett Creeley <brett.creeley@intel.com>

ice: Change ice_vsi_setup_q_map() to not depend on RSS

Currently, ice_vsi_setup_q_map() depends on the VSI's rss_size. However,
the Rx Queue Mapping section of the VSI context has no dependency on RSS.
Instead, limit the maximum number of Rx queues per TC based on the Rx
Queue mapping section of the VSI context, which currently allows for up
to 256 Rx queues per TC.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d6218317 08-Mar-2021 Qi Zhang <qi.z.zhang@intel.com>

ice: Check FDIR program status for AVF

Enable returning FDIR completion status by checking the
ctrl_vsi Rx queue descriptor value.

To enable returning FDIR completion status from ctrl_vsi Rx queue,
COMP_Queue and COMP_Report of FDIR filter programming descriptor
needs to be properly configured. After program request sent to ctrl_vsi
Tx queue, ctrl_vsi Rx queue interrupt will be triggered and
completion status will be returned.

Driver will first issue request in ice_vc_fdir_add_fltr(), then
pass FDIR context to the background task in interrupt service routine
ice_vc_fdir_irq_handler() and finally deal with them in
ice_flush_fdir_ctx(). ice_flush_fdir_ctx() will check the descriptor's
value, fdir context, and then send back virtual channel message to VF
by calling ice_vc_add_fdir_fltr_post(). An additional timer will be
setup in case of hardware interrupt timeout.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# da62c5ff 08-Mar-2021 Qi Zhang <qi.z.zhang@intel.com>

ice: Add support for per VF ctrl VSI enabling

We are going to enable FDIR configure for AVF through virtual channel.
The first step is to add helper functions to support control VSI setup.
A control VSI will be allocated for a VF when AVF creates its
first FDIR rule through ice_vf_ctrl_vsi_setup().
The patch will also allocate FDIR rule space for VF's control VSI.
If a VF asks for flow director rules, then those should come entirely
from the best effort pool and not from the guaranteed pool. The patch
allow a VF VSI to have only space in the best effort rules.

Signed-off-by: Xiaoyun Li <xiaoyun.li@intel.com>
Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 31765519 26-Feb-2021 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Use port number instead of PF ID for WoL

As per the spec, the WoL control word read from the NVM should be
interpreted as port numbers, and not PF numbers. So when checking
if WoL supported, use the port number instead of the PF ID.

Also, ice_is_wol_supported doesn't really need a pointer to the pf
struct, but just needs a pointer to the hw instance.

Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 741b7b74 26-Feb-2021 Dave Ertman <david.m.ertman@intel.com>

ice: remove DCBNL_DEVRESET bit from PF state

The original purpose of the ICE_DCBNL_DEVRESET was to protect
the driver during DCBNL device resets. But, the flow for
DCBNL device resets now consists of only calls up the stack
such as dev_close() and dev_open() that will result in NDO calls
to the driver. These will be handled with state changes from the
stack. Also, there is a problem of the dev_close and dev_open
being blocked by checks for reset in progress also using the
ICE_DCBNL_DEVRESET bit.

Since the ICE_DCBNL_DEVRESET bit is not necessary for protecting
the driver from DCBNL device resets and it is actually blocking
changes coming from the DCBNL interface, remove the bit from the
PF state and don't block driver function based on DCBNL reset in
progress.

Fixes: b94b013eb626 ("ice: Implement DCBNL support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e95fc857 26-Feb-2021 Krzysztof Goreczny <krzysztof.goreczny@intel.com>

ice: prevent ice_open and ice_stop during reset

There is a possibility of race between ice_open or ice_stop calls
performed by OS and reset handling routine both trying to modify VSI
resources. Observed scenarios:
- reset handler deallocates memory in ice_vsi_free_arrays and ice_open
tries to access it in ice_vsi_cfg_txq leading to driver crash
- reset handler deallocates memory in ice_vsi_free_arrays and ice_close
tries to access it in ice_down leading to driver crash
- reset handler clears port scheduler topology and sets port state to
ICE_SCHED_PORT_STATE_INIT leading to ice_ena_vsi_txq fail in ice_open

To prevent this additional checks in ice_open and ice_stop are
introduced to make sure that OS is not allowed to alter VSI config while
reset is in progress.

Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx")
Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 0d4907f6 20-Nov-2020 Dave Ertman <david.m.ertman@intel.com>

ice: Fix state bits on LLDP mode switch

DCBX_CAP bits were not being adjusted when switching
between SW and FW controlled LLDP.

Adjust bits to correctly indicate which mode the
LLDP logic is in.

Fixes: b94b013eb626 ("ice: Implement DCBNL support")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# b126bd6b 20-Nov-2020 Kiran Patil <kiran.patil@intel.com>

ice: create scheduler aggregator node config and move VSIs

Create set scheduler aggregator node and move for VSIs into respective
scheduler node. Max children per aggregator node is 64.

There are two types of aggregator node(s) created.
1. dedicated node for PF and _CTRL VSIs
2. dedicated node(s) for VFs.

As part of reset and rebuild, aggregator nodes are recreated and VSIs
are moved to respective aggregator node.

Having related VSIs in respective tree avoid starvation between PF and VF
w.r.t Tx bandwidth.

Co-developed-by: Tarun Singh <tarun.k.singh@intel.com>
Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Co-developed-by: Victor Raj <victor.raj@intel.com>
Signed-off-by: Victor Raj <victor.raj@intel.com>
Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# df006dd4 20-Nov-2020 Dave Ertman <david.m.ertman@intel.com>

ice: Add initial support framework for LAG

Add the framework and initial implementation for receiving and processing
netdev bonding events. This is only the software support and the
implementation of the HW offload for bonding support will be coming at a
later time. There are some architectural gaps that need to be closed
before that happens.

Because this is a software only solution that supports in kernel bonding,
SR-IOV is not supported with this implementation.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# c7a21904 02-Nov-2020 Michal Swiatkowski <michal.swiatkowski@intel.com>

ice: Remove xsk_buff_pool from VSI structure

Current implementation of netdev already contains xsk_buff_pools.
We no longer have to contain these structures in ice_vsi.

Refactor the code to operate on netdev-provided xsk_buff_pools.

Move scheduling napi on each queue to a separate function to
simplify setup function.

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# e94c0df9 29-Sep-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

ice: Replace one-element array with flexible-array member

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Refactor the code according to the use of a flexible-array member in
struct ice_res_tracker, instead of a one-element array and use the
struct_size() helper to calculate the size for the allocations.

Also, notice that the code below suggests that, currently, two too many
bytes are being allocated with devm_kzalloc(), as the total number of
entries (pf->irq_tracker->num_entries) for pf->irq_tracker->list[] is
_vectors_ and sizeof(*pf->irq_tracker) also includes the size of the
one-element array _list_ in struct ice_res_tracker.

drivers/net/ethernet/intel/ice/ice_main.c:3511:
3511 /* populate SW interrupts pool with number of OS granted IRQs. */
3512 pf->num_avail_sw_msix = (u16)vectors;
3513 pf->irq_tracker->num_entries = (u16)vectors;
3514 pf->irq_tracker->end = pf->irq_tracker->num_entries;

With this change, the right amount of dynamic memory is now allocated
because, contrary to one-element arrays which occupy at least as much
space as a single object of the type, flexible-array members don't
occupy such space in the containing structure.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9-rc1/process/deprecated.html#zero-length-and-one-element-arrays

Built-tested-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# f3fe97f6 21-Jan-2021 Brett Creeley <brett.creeley@intel.com>

ice: Fix MSI-X vector fallback logic

The current MSI-X enablement logic tries to enable best-case MSI-X
vectors and if that fails we only support a bare-minimum set. This
includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
for the OICR interrupt. Unfortunately, the driver fails to load when we
don't get as many MSI-X as requested for a couple reasons.

First, the code to allocate MSI-X in the driver tries to allocate
num_online_cpus() MSI-X for LAN traffic without caring about the number
of MSI-X actually enabled/requested from the kernel for LAN traffic.
So, when calling ice_get_res() for the PF VSI, it returns failure
because the number of available vectors is less than requested. Fix
this by not allowing the PF VSI to allocation more than
pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
Limiting the number of queues is done because we don't want more than
1 Tx/Rx queue per interrupt due to performance conerns.

Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
MSI-X. This is causing a failure when the PF VSI tries to
allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
already been reserved, so there may not be enough MSI-X vectors left.
Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
the failure case.

Update the related defines used in ice_ena_msix_range() to align with
the above behavior and remove the unused RDMA defines because RDMA is
currently not supported. Also, remove the now incorrect comment.

Fixes: 152b978a1f90 ("ice: Rework ice_ena_msix_range")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# bcf68ea1 17-Sep-2020 Nick Nunley <nicholas.d.nunley@intel.com>

ice: Remove vlan_ena from vsi structure

vlan_ena was introduced to track whether VLAN filters are enabled on
the device, but
1) checking for num_vlan > 1 already gives us this information, and is
currently used in this way throughout the code
2) the logic for vlan_ena is broken when multiple VLANs are active

Just remove vlan_ena and use num_vlan instead.

Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 48d40025 07-Oct-2020 Jacob Keller <jacob.e.keller@intel.com>

ice: refactor devlink_port to be per-VSI

Currently, the devlink_port structure is stored within the ice_pf. This
made sense because we create a single devlink_port for each PF. This
setup does not mesh with the abstractions in the driver very well, and
led to a flow where we accidentally call devlink_port_unregister twice
during error cleanup.

In particular, if devlink_port_register or devlink_port_unregister are
called twice, this leads to a kernel panic. This appears to occur during
some possible flows while cleaning up from a failure during driver
probe.

If register_netdev fails, then we will call devlink_port_unregister in
ice_cfg_netdev as it cleans up. Later, we again call
devlink_port_unregister since we assume that we must cleanup the port
that is associated with the PF structure.

This occurs because we cleanup the devlink_port for the main PF even
though it was not allocated. We allocated the port within a per-VSI
function for managing the main netdev, but did not release the port when
cleaning up that VSI, the allocation and destruction are not aligned.

Instead of attempting to manage the devlink_port as part of the PF
structure, manage it as part of the PF VSI. Doing this has advantages,
as we can match the de-allocation of the devlink_port with the
unregister_netdev associated with the main PF VSI.

Moving the port to the VSI is preferable as it paves the way for
handling devlink ports allocated for other purposes such as SR-IOV VFs.

Since we're changing up how we allocate the devlink_port, also change
the indexing. Originally, we indexed the port using the PF id number.
This came from an old goal of sharing a devlink for each physical
function. Managing devlink instances across multiple function drivers is
not workable. Instead, lets set the port number to the logical port
number returned by firmware and set the index using the VSI index
(sometimes referred to as VSI handle).

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# b50f7bca 25-Sep-2020 Jesse Brandeburg <jesse.brandeburg@intel.com>

intel-ethernet: clean up W=1 warnings in kdoc

This takes care of all of the trivial W=1 fixes in the Intel
Ethernet drivers, which allows developers and maintainers to
build more of the networking tree with more complete warning
checks.

There are three classes of kdoc warnings fixed:
- cannot understand function prototype: 'x'
- Excess function parameter 'x' description in 'y'
- Function parameter or member 'x' not described in 'y'

All of the changes were trivial comment updates on
function headers.

Inspired by Lee Jones' series of wireless work to do the same.
Compile tested only, and passes simple test of
$ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
xargs scripts/kernel-doc -none

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1742b3d5 28-Aug-2020 Magnus Karlsson <magnus.karlsson@intel.com>

xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem

Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com


# a8fffd7a 29-Jul-2020 Jesse Brandeburg <jesse.brandeburg@intel.com>

ice: add useful statistics

Display and count some useful hot-path statistics. The usefulness is as
follows:

- tx_restart: use to determine if the transmit ring size is too small or
if the transmit interrupt rate is too low.
- rx_gro_dropped: use to count drops from GRO layer, which previously were
completely uncounted when occurring.
- tx_busy: use to determine when the driver is miscounting number of
descriptors needed for an skb.
- tx_timeout: as our other drivers, count the number of times we've reset
due to timeout because the kernel only prints a warning once per netdev.

Several of these were already counted but not displayed.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# d69ea414 23-Jul-2020 Jacob Keller <jacob.e.keller@intel.com>

ice: implement device flash update via devlink

Use the newly added pldmfw library to implement device flash update for
the Intel ice networking device driver. This support uses the devlink
flash update interface.

The main parts of the flash include the Option ROM, the netlist module,
and the main NVM data. The PLDM firmware file contains modules for each
of these components.

Using the pldmfw library, the provided firmware file will be scanned for
the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for
the main NVM module containing the primary device firmware, and
"fw.netlist" containing the netlist module.

The flash is separated into two banks, the active bank containing the
running firmware, and the inactive bank which we use for update. Each
module is updated in a staged process. First, the inactive bank is
erased, preparing the device for update. Second, the contents of the
component are copied to the inactive portion of the flash. After all
components are updated, the driver signals the device to switch the
active bank during the next EMP reset (which would usually occur during
the next reboot).

Although the firmware AdminQ interface does report an immediate status
for each command, the NVM erase and NVM write commands receive status
asynchronously. The driver must not continue writing until previous
erase and write commands have finished. The real status of the NVM
commands is returned over the receive AdminQ. Implement a simple
interface that uses a wait queue so that the main update thread can
sleep until the completion status is reported by firmware. For erasing
the inactive banks, this can take quite a while in practice.

To help visualize the process to the devlink application and other
applications based on the devlink netlink interface, status is reported
via the devlink_flash_update_status_notify. While we do report status
after each 4k block when writing, there is no real status we can report
during erasing. We simply must wait for the complete module erasure to
finish.

With this implementation, basic flash update for the ice hardware is
supported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# b4e813dd 09-Jul-2020 Bruce Allan <bruce.w.allan@intel.com>

ice: support Total Port Shutdown on devices that support it

When the Port Disable bit is set in the Link Default Override Mask TLV PFA
module in the NVM, Total Port Shutdown mode is supported and enabled. In
this mode, the driver should act as if the link-down-on-close ethtool
private flag is always enabled and dis-allow any change to that flag.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# ea78ce4d 09-Jul-2020 Paul Greenwalt <paul.greenwalt@intel.com>

ice: add link lenient and default override support

Adds functions to check for link override firmware support and get
the override settings for a port. The previously supported/default link
mode was strict mode.

In strict mode link is configured based on get PHY capabilities PHY types
with media.

Lenient mode is now the default link mode. In lenient mode the link is
configured based on get PHY capabilities PHY types without media. This
allows the user to configure link that the media does not report. Limit the
minimum supported link mode to 25G for devices that support 100G, and 1G
for devices that support less than 100G.

Default override is only supported in lenient mode. If default override
is supported and enabled, then default override values are used for
configuring speed and FEC. Default override provide persistent link
settings in the NVM.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Evan Swanson <evan.swanson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 1a3571b5 09-Jul-2020 Paul Greenwalt <paul.greenwalt@intel.com>

ice: restore PHY settings on media insertion

After the transition from no media to media FW will clear the
set-phy-cfg data set by the user. Save initial PHY settings and any
settings later requested by the user and use that data to restore PHY
settings on media insertion. Since PHY configuration is now being stored,
replace calls that were calling FW to get the configuration with the saved
copy.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 769c500d 09-Jul-2020 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

ice: Add advanced power mgmt for WoL

Add callbacks needed to support advanced power management for Wake on LAN.
Also make ice_pf_state_is_nominal function available for all configurations
not just CONFIG_PCI_IOV.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 8d7aab35 18-Jun-2020 Jacob Keller <jacob.e.keller@intel.com>

ice: implement snapshot for device capabilities

Add a new devlink region used for capturing a snapshot of the device
capabilities buffer which is reported by the firmware over the AdminQ.
This information can useful in debugging driver and firmware
interactions.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


# 34a2a3b8 29-May-2020 Jeff Kirsher <jeffrey.t.kirsher@intel.com>

net/intel: remove driver versions from Intel drivers

As with other networking drivers, remove the unnecessary driver version
from the Intel drivers. The ethtool driver information and module version
will then report the kernel version instead.

For ixgbe, i40e and ice drivers, the driver passes the driver version to
the firmware to confirm that we are up and running. So we now pass the
value of UTS_RELEASE to the firmware. This adminq call is required per
the HAS document. The Device then sends an indication to the BMC that the
PF driver is present. This is done using Host NC Driver Status Indication
in NC-SI Get Link command or via the Host Network Controller Driver Status
Change AEN.

What the BMC may do with this information is implementation-dependent, but
this is a standard NC-SI 1.1 command we honor per the HAS.

CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Alek Loktionov <aleksandr.loktionov@intel.com>
CC: Kevin Liedtke <kevin.d.liedtke@intel.com>
CC: Aaron Rowden <aaron.f.rowden@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>


# 28bf2672 11-May-2020 Brett Creeley <brett.creeley@intel.com>

ice: Implement aRFS

Enable accelerated Receive Flow Steering (aRFS). It is used to steer Rx
flows to a specific queue. This functionality is triggered by the network
stack through ndo_rx_flow_steer and requires Flow Director (ntuple on) to
function.

The fltr_info is used to add/remove/update flow rules in the HW, the
fltr_state is used to determine what to do with the filter with respect
to HW and/or SW, and the flow_id is used in co-ordination with the
network stack.

The work for aRFS is split into two paths: the ndo_rx_flow_steer
operation and the ice_service_task. The former is where the kernel hands
us an Rx SKB among other items to setup aRFS and the latter is where
the driver adds/updates/removes filter rules from HW and updates filter
state.

In the Rx path the following things can happen:
1. New aRFS entries are added to the hash table and the state is
set to ICE_ARFS_INACTIVE so the filter can be updated in HW
by the ice_service_task path.
2. aRFS entries have their Rx Queue updated if we receive a
pre-existing flow_id and the filter state is ICE_ARFS_ACTIVE.
The state is set to ICE_ARFS_INACTIVE so the filter can be
updated in HW by the ice_service_task path.
3. aRFS entries marked as ICE_ARFS_TODEL are deleted

In the ice_service_task path the following things can happen:
1. New aRFS entries marked as ICE_ARFS_INACTIVE are added or
updated in HW.
and their state is updated to ICE_ARFS_ACTIVE.
2. aRFS entries are deleted from HW and their state is updated
to ICE_ARFS_TODEL.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 83af0039 11-May-2020 Henry Tieman <henry.w.tieman@intel.com>

ice: Restore filters following reset

Following a reset, Flow Director filters are cleared from the hardware.
Rebuild the filters using the software structures containing the filter
rules.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# cac2a27c 11-May-2020 Henry Tieman <henry.w.tieman@intel.com>

ice: Support IPv4 Flow Director filters

Support the addition and deletion of IPv4 filters.

Supported fields are: src-ip, dst-ip, src-port, and dst-port
Supported flow-types are: tcp4, udp4, sctp4, ip4

Example usage:

ethtool -N eth0 flow-type tcp4 src-ip 192.168.0.55 dst-ip 172.16.0.55 \
src-port 16 dst-port 12 action 32

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 4ab95646 11-May-2020 Henry Tieman <henry.w.tieman@intel.com>

ice: Support displaying ntuple rules

Add functionality for ethtool --show-ntuple, allowing for filters to be
displayed when set functionality is added. Add statistics related to
Flow Director matches and status.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 148beb61 11-May-2020 Henry Tieman <henry.w.tieman@intel.com>

ice: Initialize Flow Director resources

Flow Director allows for redirection based on ntuple rules. Rules are
programmed using the ethtool set-ntuple interface. Supported actions are
redirect to queue and drop.

Setup the initial framework to process Flow Director filters. Create and
allocate resources to manage and program filters to the hardware. Filters
are processed via a sideband interface; a control VSI is created to manage
communication and process requests through the sideband. Upon allocation of
resources, update the hardware tables to accept perfect filters.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 53bb6698 07-May-2020 Jesse Brandeburg <jesse.brandeburg@intel.com>

ice: cleanup vf_id signedness

The vf_id variable is dealt with in the code in inconsistent
ways of sign usage, preventing compilation with -Werror=sign-compare.
Fix this problem in the code by always treating vf_id as unsigned, since
there are no valid values of vf_id that are negative.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 88865fc4 07-May-2020 Karol Kolacinski <karol.kolacinski@intel.com>

ice: Fix casting issues

Change min() macros to min_t() which has compare type specified and it
helps avoid precision loss.

In some cases there was precision loss during calls or assignments.
Some fields in structs were unnecessarily large and gave multiple
warnings.

There were also some minor type differences which are now fixed as well as
some cases where a simple cast was needed.

Callers were were passing data that is a u16 to
ice_sched_cfg_node_bw_alloc() but the function was truncating that to a u8.
Fix that by changing the function to take a u16.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 0fee3577 07-May-2020 Lihong Yang <lihong.yang@intel.com>

ice: Provide more meaningful error message

When printing the ice status or AQ error codes, instead of printing out the
numerical value, provide the description of the error code. This provides
more info about the issue than a number.

Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 01b5e89a 07-May-2020 Brett Creeley <brett.creeley@intel.com>

ice: Add VF promiscuous support

Implement promiscuous support for VF VSIs. Behaviour of promiscuous support
is based on VF trust as well as the, introduced, vf-true-promisc flag.

A trusted VF with vf-true-promisc disabled will be the default VSI, which
means that all traffic without a matching destination MAC address in the
device's internal switch will be forwarded to this VF VSI.

A trusted VF with vf-true-promisc enabled will go into "true promiscuous
mode". This amounts to the VF receiving all ingress and egress traffic
that hits the device's internal switch.

An untrusted VF will only receive traffic destined for that VF.

The vf-true-promisc-support flag cannot be toggled while any VF is in
promiscuous mode. This flag should be set prior to loading the iavf driver
or spawning VF(s).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# a4e82a81 06-May-2020 Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add support for tunnel offloads

Create a boost TCAM entry for each tunnel port in order to get a tunnel
PTYPE. Update netdev feature flags and implement the appropriate logic to
get and set values for hardware offloads.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# dce730f1 26-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

ice: add a devlink region for dumping NVM contents

Add a devlink region for exposing the device's Non Volatime Memory flash
contents.

Support the recently added .snapshot operation, enabling userspace to
request a snapshot of the NVM contents via DEVLINK_CMD_REGION_NEW.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 1adf7ead 11-Mar-2020 Jacob Keller <jacob.e.keller@intel.com>

ice: enable initial devlink support

Begin implementing support for the devlink interface with the ice
driver.

The pf structure is currently memory managed through devres, via
a devm_alloc. To mimic this behavior, after allocating the devlink
pointer, use devm_add_action to add a teardown action for releasing the
devlink memory on exit.

The ice hardware is a multi-function PCIe device. Thus, each physical
function will get its own devlink instance. This means that each
function will be treated independently, with its own parameters and
configuration. This is done because the ice driver loads a separate
instance for each function.

Due to this, the implementation does not enable devlink to manage
device-wide resources or configuration, as each physical function will
be treated independently. This is done for simplicity, as managing
a devlink instance across multiple driver instances would significantly
increase the complexity for minimal gain.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# c8a1071d 27-Feb-2020 Lukasz Czapnik <lukasz.czapnik@intel.com>

ice: Increase mailbox receive queue length to maximum

Currently the PF's mailbox receive queue is only 512 entries. This fine,
but considering that all VF's mailbox send queues funnel into the PF's
single mailbox receive queue, let's increase it to the maximum size. This
will help prevent any possible bottleneck/slowdown occurring from the PF's
mailbox receive queue being full.

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# f844d521 27-Feb-2020 Brett Creeley <brett.creeley@intel.com>

ice: Fix removing driver while bare-metal VFs pass traffic

Currently, if there are bare-metal VFs passing traffic and the ice
driver is removed, there is a possibility of VFs triggering a Tx timeout
right before iavf_remove(). This is causing iavf_close() to not be
called because there is a check in the beginning of iavf_remove() that
bails out early if (adapter->state < IAVF_DOWN_PENDING). This makes it
so some resources do not get cleaned up. Specifically, free_irq()
is never called for data interrupts, which results in the following line
of code to trigger:

pci_disable_msix()
free_msi_irqs()
...
BUG_ON(irq_has_action(entry->irq + i));
...

To prevent the Tx timeout from occurring on the VF during driver unload
for ice and the iavf there are a few changes that are needed.

[1] Don't disable all active VF Tx/Rx queues prior to calling
pci_disable_sriov.

[2] Call ice_free_vfs() before disabling the service task.

[3] Disable VF resets when the ice driver is being unloaded by setting
the pf->state flag __ICE_VF_RESETS_DISABLED.

Changing [1] and [2] allow each VF driver's remove flow to successfully
send VIRTCHNL requests, which includes queue disable. This prevents
unexpected Tx timeouts because the PF driver is no longer forcefully
disabling queues.

Due to [1] and [2] there is a possibility that the PF driver will get a
VFLR or reset request over VIRTCHNL from a VF during PF driver unload.
Prevent that by doing [3].

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 46c276ce 27-Feb-2020 Brett Creeley <brett.creeley@intel.com>

ice: Improve clarity of prints and variables

Currently when the device runs out of MSI-X interrupts a cryptic and
unhelpful message is printed. This will cause confusion when hitting this
case. Fix this by clearing up the error message for both SR-IOV and non
SR-IOV use cases.

Also, make a few minor changes to increase clarity of variables.
1. Change per VF MSI-X and queue pair variables in the PF structure.
2. Use ICE_NONQ_VECS_VF when determining pf->num_msix_per_vf instead of
the magic number "1". This vector is reserved for the OICR.

All of the resource tracking functions were moved to avoid adding
any forward declaration function prototypes.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 0ca469fb 27-Feb-2020 Mitch Williams <mitch.a.williams@intel.com>

ice: allow bigger VFs

Unlike the XL710 series, 800-series hardware can allocate more than 4
MSI-X vectors per VF. This patch enables that functionality. We
dynamically allocate vectors and queues depending on how many VFs are
enabled. Allocating the maximum number of VFs replicates XL710
behavior with 4 queues and 4 vectors. But allocating a smaller number
of VFs will give you 16 queues and 16 vectors.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 9d5c5a52 13-Feb-2020 Paul Greenwalt <paul.greenwalt@intel.com>

ice: update malicious driver detection event handling

Update the PF VFs MDD event message to rate limit once per second and
report the total number Rx|Tx event count. Add support to print pending
MDD events that occur during the rate limit. The use of net_ratelimit did
not allow for per VF Rx|Tx granularity.

Additional PF MDD log messages are guarded by netif_msg_[rx|tx]_err().

Since VF RX MDD events disable the queue, add ethtool private flag
mdd-auto-reset-vf to configure VF reset to re-enable the queue.

Disable anti-spoof detection interrupt to prevent spurious events
during a function reset.

To avoid race condition do not make PF MDD register reads conditional
on global MDD result.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 65bb559b 12-Dec-2019 Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>

ice: Add a boundary check in ice_xsk_umem()

In ice_xsk_umem(), variable qid which is later used as an array index,
is not validated for a possible boundary exceedance. Because of that,
a calling function might receive an invalid address, which causes
general protection fault when dereferenced.

To address this, add a boundary check to see if qid is greater than the
size of a UMEM array. Also, don't let user change vsi->num_xsk_umems
just by trying to setup a second UMEM if its value is already set up
(i.e. UMEM region has already been allocated for this VSI).

While at it, make sure that ring->zca.free pointer is always zeroed out
if there is no UMEM on a specified ring.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# fc0f39bc 12-Dec-2019 Brett Creeley <brett.creeley@intel.com>

ice: Add code to keep track of current dflt_vsi

We can't have more than one default VSI so prevent another VSI from
overwriting the current dflt_vsi. This was achieved by adding the
following functions:

ice_is_dflt_vsi_in_use()
- Used to check if the default VSI is already being used.

ice_is_vsi_dflt_vsi()
- Used to check if VSI passed in is in fact the default VSI.

ice_set_dflt_vsi()
- Used to set the default VSI via a switch rule

ice_clear_dflt_vsi()
- Used to clear the default VSI via a switch rule.

Also, there was no need to introduce any locking because all mailbox
events and synchronization of switch filters for the PF happen in the
service task.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# cd6d6b83 12-Dec-2019 Brett Creeley <brett.creeley@intel.com>

ice: Fix VF spoofchk

There are many things wrong with the function
ice_set_vf_spoofchk().

1. The VSI being modified is the PF VSI, not the VF VSI.
2. We are enabling Rx VLAN pruning instead of Tx VLAN anti-spoof.
3. The spoofchk setting for each VF is not initialized correctly
or re-initialized correctly on reset.

To fix [1] we need to make sure we are modifying the VF VSI.
This is done by using the vf->lan_vsi_idx to index into the PF's
VSI array.

To fix [2] replace setting Rx VLAN pruning in ice_set_vf_spoofchk()
with setting Tx VLAN anti-spoof.

To Fix [3] we need to make sure the initial VSI settings match what
is done in ice_set_vf_spoofchk() for spoofchk=on. Also make sure
this also works for VF reset. This was done by modifying ice_vsi_init()
to account for the current spoofchk state of the VF VSI.

Because of these changes, Tx VLAN anti-spoof needs to be removed
from ice_cfg_vlan_pruning(). This is okay for the VF because this
is now controlled from the admin enabling/disabling spoofchk. For the
PF, Tx VLAN anti-spoof should not be set. This change requires us to
call ice_set_vf_spoofchk() when configuring promiscuous mode for
the VF which requires ice_set_vf_spoofchk() to move in order to prevent
a forward declaration prototype.

Also, add VLAN 0 by default when allocating a VF since the PF is unaware
if the guest OS is running the 8021q module. Without this, MDD events will
trigger on untagged traffic because spoofcheck is enabled by default. Due
to this change, ignore add/delete messages for VLAN 0 from VIRTCHNL since
this is added/deleted during VF initialization/teardown respectively and
should not be modified.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 87324e74 08-Nov-2019 Henry Tieman <henry.w.tieman@intel.com>

ice: Implement ethtool ops for channels

Add code to query and set the number of channels on the primary VSI for a
PF. This is accessed from the 'ethtool -l' and 'ethtool -L' commands,
respectively. Though the ice driver supports asymmetric queues report an
IRQ vector that has both Rx and Tx queues attached and is counted as a
'combined' channel.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 4015d11e 08-Nov-2019 Brett Creeley <brett.creeley@intel.com>

ice: Add ice_pf_to_dev(pf) macro

We use &pf->dev->pdev all over the code. Add a simple
macro to do this for us. When multiple de-references
like this are being done add a local struct device
variable.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# b94b013e 06-Nov-2019 Dave Ertman <david.m.ertman@intel.com>

ice: Implement DCBNL support

Implement interface layer for the DCBNL subsystem. These are the functions
to support the callbacks defined in the dcbnl_rtnl_ops struct. These
callbacks are going to be used to interface with the DCB settings of the
device. Implementation of dcb_nl set functions and supporting SW DCB
functions.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 9d614b64 06-Nov-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Use ice_ena_vsi and ice_dis_vsi in DCB configuration flow

DCB configuration flow needs to disable and enable only the PF (main)
VSI, so use ice_ena_vsi and ice_dis_vsi. To avoid the use of ifdef to
control the staticness of these functions, move them to ice_lib.c.

Also replace the allocate and copy of old_cfg to kmemdup() in
ice_pf_dcb_cfg().

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 7237f5b0 24-Oct-2019 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: introduce legacy Rx flag

Add an ethtool "legacy-rx" priv flag for toggling the Rx path. This
control knob will be mainly used for build_skb usage as well as buffer
size/MTU manipulation.

In preparation for adding build_skb support in a way that it takes
care of how we set the values of max_frame and rx_buf_len fields of
struct ice_vsi. Specifically, in this patch mentioned fields are set to
values that will allow us to provide headroom and tailroom in-place.

This can be mostly broken down onto following:
- for legacy-rx "on" ethtool control knob, old behaviour is kept;
- for standard 1500 MTU size configure the buffer of size 1536, as
network stack is expecting the NET_SKB_PAD to be provided and
NET_IP_ALIGN can have a non-zero value (these can be typically equal
to 32 and 2, respectively);
- for larger MTUs go with max_frame set to 9k and configure the 3k
buffer in case when PAGE_SIZE of underlying arch is less than 8k; 3k
buffer is implying the need for order 1 page, so that our page
recycling scheme can still be applied;

With that said, substitute the hardcoded ICE_RXBUF_2048 and PAGE_SIZE
values in DMA API that we're making use of with rx_ring->rx_buf_len and
ice_rx_pg_size(rx_ring). The latter is an introduced helper for
determining the page size based on its order (which was figured out via
ice_rx_pg_order). Last but not least, take care of truesize calculation.

In the followup patch the headroom/tailroom computation logic will be
introduced.

This change aligns the buffer and frame configuration with other Intel
drivers, most importantly with iavf.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 2d4238f5 04-Nov-2019 Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>

ice: Add support for AF_XDP

Add zero copy AF_XDP support. This patch adds zero copy support for
Tx and Rx; code for zero copy is added to ice_xsk.h and ice_xsk.c.

For Tx, implement ndo_xsk_wakeup. As with other drivers, reuse
existing XDP Tx queues for this task, since XDP_REDIRECT guarantees
mutual exclusion between different NAPI contexts based on CPU ID. In
turn, a netdev can XDP_REDIRECT to another netdev with a different
NAPI context, since the operation is bound to a specific core and each
core has its own hardware ring.

For Rx, allocate frames as MEM_TYPE_ZERO_COPY on queues that AF_XDP is
enabled.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# efc2214b 04-Nov-2019 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: Add support for XDP

Add support for XDP. Implement ndo_bpf and ndo_xdp_xmit. Upon load of
an XDP program, allocate additional Tx rings for dedicated XDP use.
The following actions are supported: XDP_TX, XDP_DROP, XDP_REDIRECT,
XDP_PASS, and XDP_ABORTED.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# eff380aa 24-Oct-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Introduce ice_base.c

Remove a few uses of kernel configuration flags from ice_lib.c by
introducing a new source file ice_base.c. Also move corresponding
function prototypes from ice_lib.h to ice_base.h and include ice_base.h
where required.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 462acf6a 09-Sep-2019 Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Enable DDP package download

Attempt to request an optional device-specific DDP package file
(one with the PCIe Device Serial Number in its name so that different DDP
package files can be used on different devices). If the optional package
file exists, download it to the device. If not, download the default
package file.

Log an appropriate message based on whether or not a DDP package
file exists and the return code from the attempt to download it to the
device. If the download fails and there is not already a package file on
the device, go into "Safe Mode" where some features are not supported.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 870f805e 09-Sep-2019 Lukasz Czapnik <lukasz.czapnik@intel.com>

ice: Fix FW version formatting in dmesg

The FW build id is currently being displayed as an int which doesn't make
sense. Instead display FW build id as a hex value. Also add other useful
information to the output such as NVM version, API patch info, and FW
build hash.

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# e3710a01 09-Sep-2019 Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>

ice: send driver version to firmware

The driver is required to send a version to the firmware
to indicate that the driver is up. If the driver doesn't
do this the firmware doesn't behave properly.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# dd47e1fd 03-Sep-2019 Jesse Brandeburg <jesse.brandeburg@intel.com>

ice: change default number of receive descriptors

The driver should start out with a reasonable number of descriptors that
can prevent drops due to a CPU being in a power management state.
Change the default number of descriptors to 2048.
The user can always change the value at runtime. Transmit descriptor
counts are not modified because they don't need to change due to the
speed of the interface, or for power managed CPUs, but the code is
simplified to a fixed value for the transmit default.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 8c243700 03-Sep-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Minor refactor in queue management

Remove q_left_tx and q_left_rx from the PF struct as these can be
obtained by calling ice_get_avail_txq_count and ice_get_avail_rxq_count
respectively.

The function ice_determine_q_usage is only setting num_lan_tx and
num_lan_rx in the PF structure, and these are later assigned to
vsi->alloc_txq and vsi->alloc_rxq respectively. This is an unnecessary
indirection, so remove ice_determine_q_usage and just assign values
for vsi->alloc_txq and vsi->alloc_rxq in ice_vsi_set_num_qs and use
these to set num_lan_tx and num_lan_rx respectively.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 9d56b7fd 08-Aug-2019 Jesse Brandeburg <jesse.brandeburg@intel.com>

ice: change work limit to a constant

The driver has supported a transmit work limit
that was configurable from ethtool for a long time, but
there are no good use cases for having it be a variable
that can be changed at run time. In addition, this
variable was noted to be causing performance overhead
due to cache misses.

Just remove the variable and let the code use a constant
so that the functionality is maintained (a limit on the
number of transmits that will be cleaned in any one call
to the clean routines) without the cache miss.

Removes code, removes a variable, removes testing surface. Yay.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 208ff751 08-Aug-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add ice_get_main_vsi to get PF/main VSI

There are multiple places where we currently use ice_find_vsi_by_type
to get the PF (a.k.a. main) VSI. The PF VSI by definition is always
the first element in the pf->vsi array (i.e. pf->vsi[0]). So instead
add and use a new helper function ice_get_main_vsi, which just returns
pf->vsi[0].

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 78b5713a 02-Aug-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Alloc queue management bitmaps and arrays dynamically

The total number of queues available on the device is divided between
multiple physical functions (PF) in the firmware and provided to the
driver when it gets function capabilities from the firmware. Thus
each PF knows how many Tx/Rx queues it has. These queues are then
doled out to different VSIs (for LAN traffic, SR-IOV VF traffic, etc.)

To track usage of these queues at the PF level, the driver uses two
bitmaps avail_txqs and avail_rxqs. At the VSI level (i.e. struct ice_vsi
instances) the driver uses two arrays txq_map and rxq_map, to track
ownership of VSIs' queues in avail_txqs and avail_rxqs respectively.

The aforementioned bitmaps and arrays should be allocated dynamically,
because the number of queues supported by a PF is only available once
function capabilities have been queried. The current static allocation
consumes way more memory than required.

This patch removes the DECLARE_BITMAP for avail_txqs and avail_rxqs
and instead uses bitmap_zalloc to allocate the bitmaps during init.
Similarly txq_map and rxq_map are now allocated in ice_vsi_alloc_arrays.
As a result ICE_MAX_TXQS and ICE_MAX_RXQS defines are no longer needed.
Also as txq_map and rxq_map are now allocated and freed, some code
reordering was required in ice_vsi_rebuild for correct functioning.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 5a4a8673 25-Jul-2019 Bruce Allan <bruce.w.allan@intel.com>

ice: update ethtool stats on-demand

Users expect ethtool statistics to be updated on-demand when invoking
'ethtool -S <iface>' instead of providing a snapshot of statistics taken
once a second (the frequency of the watchdog task where stats are currently
updated). Update stats every time 'ethtool -S <iface>' is run.

Also, fix an indentation style issue and an unnecessary local variable
initialization in ice_get_ethtool_stats() discovered while investigating
the subject issue.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 84a118ab 29-Jul-2019 Dave Ertman <david.m.ertman@intel.com>

ice: Rename ethtool private flag for lldp

The current flag name of "enable-fw-lldp" is a bit cumbersome.

Change priv-flag name to "fw-lldp-agent" with a value of on or
off. This is more straight-forward in meaning.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# c275684b 25-Jul-2019 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

ice: Move VF resources definition to SR-IOV specific file

In order to use some of the VF resources definition in the SR-IOV specific
virtchnl header file, this patch moves applicable code to
ice_virtchnl_pf.h file accordingly... and they should have been defined in
the destination file originally.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 11836214 25-Jul-2019 Brett Creeley <brett.creeley@intel.com>

ice: Increase size of Mailbox receive queue for many VFs

Currently we use the ICE_MBXQ_LEN for both the Mailbox send and receive
queues that are used to communicate with VFs. This is fine for the send
queue because the PF driver will lock the queue for every single send,
but for the Mailbox receive queue every VF is posting to its Mailbox
send queue and the hardware is then handing the message to the PF on its
Mailbox receive queue. This becomes a problem with many VFs because it
seems to overburden the Mailbox receive queue on the PF. Fix this by
increasing the Mailbox receive queue for the PF to 512 entries.

The number 512 was determined based on the number of VFs supported by
the device. We can have a total of 256 VFs so in the worst case this
allows the VFs to put 2 messages in the PFs Mailbox receive queue at the
same time.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# d82dd83d 25-Jul-2019 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

ice: Restructure VFs initialization flows

This patch restructures how VFs are configured, and resources allocated.
Instead of freeing resources that were never allocated, and resetting
empty VFs that have never been created - the new flow will just allocate
resources for number of requested VFs based on the availability.

During VFs initialization process, global interrupt is disabled, and
rearmed after getting MSIX vectors for VFs. This allows immediate mailbox
communications, instead of delaying it till later and VFs.
PF communications resulted to using polling instead of actual interrupt.
The issue manifested when creating higher number of VFs (128 VFs) per PF.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# ba880734 26-Jun-2019 Brett Creeley <brett.creeley@intel.com>

ice: Remove unnecessary flag ICE_FLAG_MSIX_ENA

This flag is not needed and is called every time we re-enable interrupts
in the hotpath so remove it. Also remove ice_vsi_req_irq() because it
was a wrapper function for ice_vsi_req_irq_msix() whose sole purpose was
checking the ICE_FLAG_MSIX_ENA flag.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 6d599946 26-Jun-2019 Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Do not configure port with no media

The firmware reports an error when trying to configure a port with no
media. Instead of always configuring the port, check for media before
attempting to configure it. In the absence of media, turn off link and
poll for media to become available before re-enabling link.

Move ice_force_phys_link_state() up to avoid forward declaration.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 8be92a76 16-Apr-2019 Preethi Banala <preethi.banala@intel.com>

ice: Change minimum descriptor count value for Tx/Rx rings

Change minimum number of descriptor count from 32 to 64. This is to have
a feature parity with previous Intel NIC drivers.

Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# cbe66bfe 16-Apr-2019 Brett Creeley <brett.creeley@intel.com>

ice: Refactor interrupt tracking

Currently we have two MSI-x (IRQ) trackers, one for OS requested MSI-x
entries (sw_irq_tracker) and one for hardware MSI-x vectors
(hw_irq_tracker). Generally the sw_irq_tracker has less entries than the
hw_irq_tracker because the hw_irq_tracker has entries equal to the max
allowed MSI-x per PF and the sw_irq_tracker is mainly the minimum (non
SR-IOV portion of the vectors, kernel granted IRQs). All of the non
SR-IOV portions of the driver (i.e. LAN queues, RDMA queues, OICR, etc.)
take at least one of each type of tracker resource. SR-IOV only grabs
entries from the hw_irq_tracker. There are a few issues with this approach
that can be seen when doing any kind of device reconfiguration (i.e.
ethtool -L, SR-IOV, etc.). One of them being, any time the driver creates
an ice_q_vector and associates it to a LAN queue pair it will grab and
use one entry from the hw_irq_tracker and one from the sw_irq_tracker.
If the indices on these does not match it will cause a Tx timeout, which
will cause a reset and then the indices will match up again and traffic
will resume. The mismatched indices come from the trackers not being the
same size and/or the search_hint in the two trackers not being equal.
Another reason for the refactor is the co-existence of features with
SR-IOV. If SR-IOV is enabled and the interrupts are taken from the end
of the sw_irq_tracker then other features can no longer use this space
because the hardware has now given the remaining interrupts to SR-IOV.

This patch reworks how we track MSI-x vectors by removing the
hw_irq_tracker completely and instead MSI-x resources needed for SR-IOV
are determined all at once instead of per VF. This can be done because
when creating VFs we know how many are wanted and how many MSI-x vectors
each VF needs. This also allows us to start using MSI-x resources from
the end of the PF's allowed MSI-x vectors so we are less likely to use
entries needed for other features (i.e. RDMA, L2 Offload, etc).

This patch also reworks the ice_res_tracker structure by removing the
search_hint and adding a new member - "end". Instead of having a
search_hint we will always search from 0. The new member, "end", will be
used to manipulate the end of the ice_res_tracker (specifically
sw_irq_tracker) during runtime based on MSI-x vectors needed by SR-IOV.
In the normal case, the end of ice_res_tracker will be equal to the
ice_res_tracker's num_entries.

The sriov_base_vector member was added to the PF structure. It is used
to represent the starting MSI-x index of all the needed MSI-x vectors
for all SR-IOV VFs. Depending on how many MSI-x are needed, SR-IOV may
have to take resources from the sw_irq_tracker. This is done by setting
the sw_irq_tracker->end equal to the pf->sriov_base_vector. When all
SR-IOV VFs are removed then the sw_irq_tracker->end is reset back to
sw_irq_tracker->num_entries. The sriov_base_vector, along with the VF's
number of MSI-x (pf->num_vf_msix), vf_id, and the base MSI-x index on
the PF (pf->hw.func_caps.common_cap.msix_vector_first_id), is used to
calculate the first HW absolute MSI-x index for each VF, which is used
to write to the VPINT_ALLOC[_PCI] and GLINT_VECT2FUNC registers to
program the VFs MSI-x PCI configuration bits. Also, the sriov_base_vector
is used along with VF's num_vf_msix, vf_id, and q_vector->v_idx to
determine the MSI-x register index (used for writing to GLINT_DYN_CTL)
within the PF's space.

Interrupt changes removed any references to hw_base_vector, hw_oicr_idx,
and hw_irq_tracker. Only sw_base_vector, sw_oicr_idx, and sw_irq_tracker
variables remain. Change all of these by removing the "sw_" prefix to
help avoid confusion with these variables and their use.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 0e674aeb 16-Apr-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add handler for ethtool selftest

This patch adds a handler for ethtool selftest. Selftest includes
testing link, interrupts, eeprom, registers and packet loopback.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 1aec6e1b 16-Apr-2019 Brett Creeley <brett.creeley@intel.com>

ice: Set minimum default Rx descriptor count to 512

Currently we set the default number of Rx descriptors per
queue to the system's page size divided by the number of bytes per
descriptor. For 4K page size systems this is resulting in 128 Rx
descriptors per queue. This is causing more dropped packets than desired
in the default configuration. Fix this by setting the minimum default
Rx descriptor count per queue to 512.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# aa6ccf3f 16-Apr-2019 Brett Creeley <brett.creeley@intel.com>

ice: Fix couple of issues in ice_vsi_release

Currently the driver is calling ice_napi_del() and then
unregister_netdev(). The call to unregister_netdev() will result in a
call to ice_stop() and then ice_vsi_close(). This is where we call
napi_disable() for all the MSI-X vectors. This flow is reversed so make
the changes to ensure napi_disable() happens prior to napi_del().

Before calling napi_del() and free_netdev() make sure
unregister_netdev() was called. This is done by making sure the
__ICE_DOWN bit is set in the vsi->state for the interested VSI.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 0ab54c5f 16-Apr-2019 Jesse Brandeburg <jesse.brandeburg@intel.com>

ice: Use bitfields when possible

We can use bit fields to store boolean values and when the
bit fields are next to each other, the compiler will combine them
(as long as the size holds enough).

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 31eafa40 16-Apr-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Implement LLDP persistence

Implement LLDP persistence across reboots, start and stop of LLDP agent.
Add additional parameter to ice_aq_start_lldp and ice_aq_stop_lldp.

Also change the ethtool private flag from "disable-fw-lldp" to
"enable-fw-lldp". This change will flip the boolean logic of the
functionality of the flag (on = enable, off = disable). The change
in name and functionality is to differentiate between the
pre-persistence and post-persistence states.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# d95276ce 16-Apr-2019 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

ice: Add function to program ethertype based filter rule on VSIs

This patch adds function to program VSI with ethertype based filter rule,
so that all flow control frames would be disallowed from being transmitted
to the client, in order to prevent malicious VSI, especially VF from
sending out PAUSE or PFC frames, and then control other VSIs traffic.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# ba0db585 16-Apr-2019 Michal Swiatkowski <michal.swiatkowski@intel.com>

ice: Add more validation in ice_vc_cfg_irq_map_msg

Add few checks to validate msg from iavf driver.

Test if we have got enough q_vectors allocated in VSI connected with VF.
Add masks for itr_indx and msix_indx to avoid writing to reserved fieldi
of QINT. Clear q_vector->num_ring_rx/tx, without it we can increment this
value every time we send irq map msg from VF. So after second call this
value will be incorrect.

Decrement num_vectors from msg, because last vector in iavf msg is misc
vector (we don't set map for it).

Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# c2a23e00 28-Feb-2019 Brett Creeley <brett.creeley@intel.com>

ice: Refactor link event flow

Currently the link event flow works, but can be much better.
Refactor the link event flow to make it cleaner and more clear
on what is going on.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# b07833a0 28-Feb-2019 Brett Creeley <brett.creeley@intel.com>

ice: Add reg_idx variable in ice_q_vector structure

Every time we want to re-enable interrupts and/or write to a register
that requires an interrupt vector's hardware index we do the following:

vsi->hw_base_vector + q_vector->v_idx

This is a wasteful operation, especially in the hot path. Fix this by
adding a u16 reg_idx member to the ice_q_vector structure and make the
necessary changes to make this work.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 3a257a14 28-Feb-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add code to control FW LLDP and DCBX

This patch adds code to start or stop LLDP and DCBX in firmware through
use of ethtool private flags.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 7b9ffc76 28-Feb-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add code for DCB initialization part 3/4

This patch adds a new function ice_pf_dcb_cfg (and related helpers)
which applies the DCB configuration obtained from the firmware. As
part of this, VSIs/netdevs are updated with traffic class information.

This patch requires a bit of a refactor of existing code.

1. For a MIB change event, the associated VSI is closed and brought up
again. The gap between closing and opening the VSI can cause a race
condition. Fix this by grabbing the rtnl_lock prior to closing the
VSI and then only free it after re-opening the VSI during a MIB
change event.

2. ice_sched_query_elem is used in ice_sched.c and with this patch, in
ice_dcb.c as well. However, ice_dcb.c is not built when CONFIG_DCB is
unset. This results in namespace warnings (ice_sched.o: Externally
defined symbols with no external references) when CONFIG_DCB is unset.
To avoid this move ice_sched_query_elem from ice_sched.c to
ice_common.c.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 37b6f646 28-Feb-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add code for DCB initialization part 1/4

This patch introduces a skeleton for ice_init_pf_dcb, the top level
function for DCB initialization. Subsequent patches will add to this
DCB init flow.

In this patch, ice_init_pf_dcb checks if DCB is a supported capability.
If so, an admin queue call to start the LLDP and DCBx in firmware is
issued. If not, an error is reported. Note that we don't fail the driver
init if DCB init fails.

Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# f9867df6 19-Feb-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Fix incorrect use of abbreviations

Capitalize abbreviations and spell out some that aren't obvious.

Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 94c4441b 19-Feb-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Fix typos in code comments

This patch fixes typos in code comments.

Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 8244dd2d 19-Feb-2019 Brett Creeley <brett.creeley@intel.com>

ice: Audit hotpath structures with pahole

Currently the ice_q_vector structure and ice_ring_container structure
are taking up more space than necessary due to cache alignment holes
and unnecessary variables respectively. This is not helping the
driver's performance. The following fixes were done to improve cache
alignment, reduce wasted space, and increase performance.

1. Remove the ice_latency_range enum as it is unused.
2. Remove the latency_range variable in the ice_ring_container structure.
3. Change the size of the itr_idx in the ice_ring_container structure
from an int to an u16. This reduced the size of ice_ring_container
structure to 32 Bytes so it has no holes or padding.
4. Re-arrange the ice_q_vector structure using pahole to align
members as best as possible in regards to 64 Byte cache line size.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 64a59d05 19-Feb-2019 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Fix for adaptive interrupt moderation

commit 63f545ed1285 ("ice: Add support for adaptive interrupt moderation")
was meant to add support for adaptive interrupt moderation but there was
an error on my part while formatting the patch, and thus only part of the
patch ended up being submitted.

This patch rectifies the error by adding the rest of the code.

Fixes: 63f545ed1285 ("ice: Add support for adaptive interrupt moderation")
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 5eda8afd 26-Feb-2019 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

ice: Add support for PF/VF promiscuous mode

Implement support for VF promiscuous mode, MAC/VLAN/MAC_VLAN and PF
multicast MAC/VLAN/MAC_VLAN promiscuous mode.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# c8b7abdd 26-Feb-2019 Bruce Allan <bruce.w.allan@intel.com>

ice: fix some function prototype and signature style issues

Put the return type on a separate line for function prototypes and
signatures that would exceed the 80-character limit if both were on
the same line.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# ad71b256 08-Feb-2019 Brett Creeley <brett.creeley@intel.com>

ice: Determine descriptor count and ring size based on PAGE_SIZE

Currently we set the default number of Tx and Rx descriptors to 128 by
default. For Rx this amounts to a full page (assuming 4K pages) because
each Rx descriptor is 32 Bytes, but for Tx it only amounts to a half
page because each Tx descriptor is 16 Bytes (assuming 4K pages).
Instead of assuming 4K pages, determine the ring size and the number of
descriptors for Tx and Rx based on a calculation using the PAGE_SIZE,
ICE_MAX_NUM_DESC, and ICE_REQ_DESC_MULTIPLE. This change is being made
to improve the performance of the driver when using the default
settings.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 5ed5d316 08-Feb-2019 Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice: Fix the calculation of ICE_MAX_MTU

Currently ICE_MAX_MTU subtracts only ETH_HLEN from max frame size and
adds ETH_FCS_LEN and VLAN_HLEN, which is not what was intended.
The ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN expression should be surrounded
with parentheses.

Wrap mentioned expression and take into account VLAN double tagging.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# cf909e19 19-Dec-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Offload SCTP checksum

This patch adds the ability to offload SCTP checksum calculations to the
NIC.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 67fe64d7 19-Dec-2018 Brett Creeley <brett.creeley@intel.com>

ice: Implement getting and setting ethtool coalesce

This patch includes the following ethtool operations:

1. get_coalesce
2. set_coalesce
3. get_per_q_coalesce
4. set_per_q_coalesce

Each ITR value (current_itr/target_itr) are stored on a per
ice_ring_container basis. This is because each valid ice_ring_container
can have 1 or more rings that are tied to the same q_vector ITR index.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 03f7a986 19-Dec-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Rework queue management code for reuse

This patch reworks the queue management code to allow for reuse with the
XDP feature (to be added in a future patch).

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# ab4ab73f 19-Dec-2018 Bruce Allan <bruce.w.allan@intel.com>

ice: Add ethtool private flag to make forcing link down optional

Add new infrastructure for implementing ethtool private flags using the
existing pf->flags bitmap to store them, and add the link-down-on-close
ethtool private flag to optionally bring down the PHY link when the
interface is administratively downed.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# d337f2af 26-Oct-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Use Tx|Rx in comments

In code comments, use Tx|Rx instead of tx|rx

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# df17b7e0 26-Oct-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Cosmetic formatting changes

1. Fix several cases of double spacing
2. Fix typos
3. Capitalize abbreviations

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# c5a2a4a3 26-Oct-2018 Usha Ketineni <usha.k.ketineni@intel.com>

ice: Fix to make VLAN priority tagged traffic to appear on all TCs

This patch includes below changes to resolve the issue of ETS bandwidth
shaping to work.

1. Allocation of Tx queues is accounted for based on the enabled TC's
in ice_vsi_setup_q_map() and enabled the Tx queues on those TC's via
ice_vsi_cfg_txqs()

2. Get the mapped netdev TC # for the user priority and set the priority
to TC mapping for the VSI.

Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 995c90f2 26-Oct-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Calculate guaranteed VSIs per function and use it

Currently we are setting the guar_num_vsi to equal to ICE_MAX_VSI
which is the device limit of 768. This is incorrect and could have
unintended consequences. To fix this use the valid_function's 8-bit
bitmap returned from discovering device capabilities to determine the
guar_num_vsi per function. guar_num_vsi value is then passed on to
pf->num_alloc_vsi.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 25525b69 26-Oct-2018 Dave Ertman <david.m.ertman@intel.com>

ice: Fix napi delete calls for remove

In the remove path, the vsi->netdev is being set to NULL before the call
to free vectors. This is causing the netif_napi_del call to never be made.

Add a call to ice_napi_del to the same location as the calls to
unregister_netdev and just prior to them. This will use the reverse flow
as the register and netif_napi_add calls.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 9ecd25c2 26-Oct-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Remove duplicate addition of VLANs in replay path

ice_restore_vlan and active_vlans were originally put in place to
reprogram VLAN filters in the replay path. This is now done as part
of the much broader VSI rebuild/replay framework. So remove both
ice_restore_vlan and active_vlans

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# afd9d4ab 26-Oct-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Check for reset in progress during remove

The remove path does not currently check to see if a
reset is in progress before proceeding. This can cause
a resource collision resulting in various types of errors.

Check for reset in progress and wait for a reasonable
amount of time before allowing the remove to progress.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 1071a835 19-Sep-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Implement virtchnl commands for AVF support

virtchnl is a protocol/interface specification that allows the Intel
"Adaptive Virtual Function (AVF)" driver (iavf.ko) to work with more than
one physical function driver. The AVF driver sends "virtchnl commands"
(control plane only) to the PF driver over mailbox queues and the PF driver
executes these commands and returns a result to the VF, again over mailbox.

This patch adds AVF support for the ice PF driver by implementing the
following virtchnl commands:

VIRTCHNL_OP_VERSION
VIRTCHNL_OP_GET_VF_RESOURCES
VIRTCHNL_OP_RESET_VF
VIRTCHNL_OP_ADD_ETH_ADDR
VIRTCHNL_OP_DEL_ETH_ADDR
VIRTCHNL_OP_CONFIG_VSI_QUEUES
VIRTCHNL_OP_ENABLE_QUEUES
VIRTCHNL_OP_DISABLE_QUEUES
VIRTCHNL_OP_ADD_ETH_ADDR
VIRTCHNL_OP_DEL_ETH_ADDR
VIRTCHNL_OP_CONFIG_VSI_QUEUES
VIRTCHNL_OP_ENABLE_QUEUES
VIRTCHNL_OP_DISABLE_QUEUES
VIRTCHNL_OP_REQUEST_QUEUES
VIRTCHNL_OP_CONFIG_IRQ_MAP
VIRTCHNL_OP_CONFIG_RSS_KEY
VIRTCHNL_OP_CONFIG_RSS_LUT
VIRTCHNL_OP_GET_STATS
VIRTCHNL_OP_ADD_VLAN
VIRTCHNL_OP_DEL_VLAN
VIRTCHNL_OP_ENABLE_VLAN_STRIPPING
VIRTCHNL_OP_DISABLE_VLAN_STRIPPING

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 007676b4 19-Sep-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add support for VF reset events

Post VF initialization, there are a couple of different ways in which a
VF reset can be triggered. One is when the underlying PF itself goes
through a reset and other is via a VFLR interrupt. ice_reset_vf introduced
in this patch handles both these cases.

Also introduced in this patch is a helper function ice_aq_send_msg_to_vf
to send messages to VF over the mailbox queue. The PF uses this to send
reset notifications to VFs.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 8ede0178 19-Sep-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Update VSI and queue management code to handle VF VSI

Until now, all the VSI and queue management code supported only the PF
VSI type (ICE_VSI_PF). Update these flows to handle the VF VSI type
(ICE_VSI_VF) type as well.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# ddf30f7f 19-Sep-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add handler to configure SR-IOV

This patch implements parts of ice_sriov_configure and VF reset flow.

To create virtual functions (VFs), the user sets a value in num_vfs
through sysfs. This results in the kernel calling the handler for
.sriov_configure which is ice_sriov_configure.

VF setup first starts with a VF reset, followed by allocation of the VF
VSI using ice_vf_vsi_setup. Once the VF setup is complete a state bit
ICE_VF_STATE_INIT is set in the vf->states bitmap to indicate that
the VF is ready to go.

Also for VF reset to go into effect, it's necessary to issue a disable
queue command (ice_aqc_opc_dis_txqs). So this patch updates multiple
functions in the disable queue flow to take additional parameters that
distinguish if queues are being disabled due to VF reset.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 75d2b253 19-Sep-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add support to detect SR-IOV capability and mailbox queues

Mailbox queue is a type of control queue that's used for communication
between PF and VF. This patch adds code to initialize, configure and
use mailbox queues.

This patch also adds support to detect and parse SR-IOV capabilities
returned by the hardware.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 9e4ab4c2 19-Sep-2018 Brett Creeley <brett.creeley@intel.com>

ice: Add support for dynamic interrupt moderation

Currently there is no support for dynamic interrupt moderation. This
patch adds some initial code to support this. The following changes
were made:

1. Currently we are using multiple members to store the interrupt
granularity (itr_gran_25/50/100/200). This is not necessary because
we can query the device to determine what the interrupt granularity
should be set to, done by a new function ice_get_itr_intrl_gran.

2. Added intrl to ice_q_vector structure to support interrupt rate
limiting.

3. Added the function ice_intrl_usecs_to_reg for converting to a value
in usecs that the device understands.

4. Added call to write to the GLINT_RATE register. Disable intrl by
default for now.

5. Changed rx/tx_itr_setting to itr_setting because having both seems
redundant because a ring is either Tx or Rx.

6. Initialize itr_setting for both Tx/Rx rings in ice_vsi_alloc_rings()

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# eb0208ec 19-Sep-2018 Preethi Banala <preethi.banala@intel.com>

ice: Split irq_tracker into sw_irq_tracker and hw_irq_tracker

For the PF driver, when mapping interrupts to queues, we need to request
IRQs from the kernel and we also have to allocate interrupts from
the device.

Similarly, when the VF driver (iavf.ko) initializes, it requests the kernel
IRQs that it needs but it can't directly allocate interrupts in the device.
Instead, it sends a mailbox message to the ice driver, which then allocates
interrupts in the device on the VF driver's behalf.

Currently both these cases end up having to reserve entries in
pf->irq_tracker but irq_tracker itself is sized based on how many vectors
the PF driver needs. Under the right circumstances, the VF driver can fail
to get entries in irq_tracker, which will result in the VF driver failing
probe.

To fix this, sw_irq_tracker and hw_irq_tracker are introduced. The
sw_irq_tracker tracks only the PF's IRQ request and doesn't play any
role in VF init. hw_irq_tracker represents the device's interrupt space.
When interrupts have to be allocated in the device for either PF or VF,
hw_irq_tracker will be looked up to see if the device has run out of
interrupts.

Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 5df7e45d 19-Sep-2018 Dave Ertman <david.m.ertman@intel.com>

ice: Change pf state behavior to protect reset path

Currently, there is no bit, or set of bits, that protect the entirety
of the reset path.

If the reset is originated by the driver, then the relevant
one of the following bits will be set when the reset is scheduled:
__ICE_PFR_REQ
__ICE_CORER_REQ
__ICE_GLOBR_REQ
This bit will not be cleared until after the rebuild has completed.

If the reset is originated by the FW, then the first the driver knows of
it will be the reception of the OICR interrupt. The __ICE_RESET_OICR_RECV
bit will be set in the interrupt handler. This will also be the indicator
in a SW originated reset that we have completed the pre-OICR tasks and
have informed the FW that a reset was requested.

To utilize these bits, change the function:
ice_is_reset_recovery_pending()
to be:
ice_is_reset_in_progress()

The new function will check all of the above bits in the pf->state and
will return a true if one or more of these bits are set.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 3b6bf296 19-Sep-2018 Bruce Allan <bruce.w.allan@intel.com>

ice: fix changing of ring descriptor size (ethtool -G)

rx_mini_pending was set to an incorrect value. This was causing EINVAL to
always be returned to 'ethtool -G'. The driver does not support mini or
jumbo rings so the respective settings should be zero.

Also, change the valid range of the number of descriptors in the rings to
make the code simpler and easier for users to understand (this removes the
valid settings of 8 and 16). Add a system log message indicating when the
number is rounded-up from what the user specifies with the 'ethtool -G'
command (i.e. when it is not a multiple of 32), and update the log message
when a user-provided value is out of range to also indicate the stride.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# f934bb9b 19-Sep-2018 Bruce Allan <bruce.w.allan@intel.com>

ice: fix changing of ring descriptor size (ethtool -G)

rx_mini_pending was set to an incorrect value. This was causing EINVAL to
always be returned to 'ethtool -G'. The driver does not support mini or
jumbo rings so the respective settings should be zero.

Also, change the valid range of the number of descriptors in the rings to
make the code simpler and easier for users to understand (this removes the
valid settings of 8 and 16). Add a system log message indicating when the
number is rounded-up from what the user specifies with the 'ethtool -G'
command (i.e. when it is not a multiple of 32), and update the log message
when a user-provided value is out of range to also indicate the stride.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 8d81fa55 09-Aug-2018 Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

ice: Introduce SERVICE_DIS flag and service routine functions

This patch introduces SERVICE_DIS flag to use for stopping service task.
This flag will be checked before scheduling new tasks. Also add new
functions ice_service_task_stop to stop service task.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# b3969fd7 09-Aug-2018 Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>

ice: Add support for Tx hang, Tx timeout and malicious driver detection

When a malicious operation is detected, the firmware triggers an
interrupt, which is then picked up by the service task (specifically by
ice_handle_mdd_event). A reset is scheduled if required.

Tx hang detection works in a similar way, except the logic here monitors
the VSI's Tx queues and tries to revive them if stalled. If the hang is
not resolved, the kernel eventually calls ndo_tx_timeout, which is
handled by ice_tx_timeout.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 0f9d5027 09-Aug-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Refactor VSI allocation, deletion and rebuild flow

This patch refactors aspects of the VSI allocation, deletion and rebuild
flow. Some of the more noteworthy changes are described below.

1) On reset, all switch filters applied in the hardware are lost. In
the rebuild flow, only MAC and broadcast filters are being restored.
Instead, use a new function ice_replay_all_fltr to restore all the
filters that were previously added. To do this, remove calls to
ice_remove_vsi_fltr to prevent cleaning out the internal bookkeeping
structures that ice_replay_all_fltr uses to replay filters.

2) Introduce a new state bit __ICE_PREPARED_FOR_RESET to distinguish the
PF that requested the reset (and consequently prepared for it) from
the rest of the PFs. These other PFs will prepare for reset only
when they receive an interrupt from the firmware.

3) Use new functions ice_add_vsi and ice_free_vsi to create and destroy
VSIs respectively. These functions accept a handle to uniquely
identify a VSI. This same handle is required to rebuild the VSI post
reset. To prevent confusion, the existing ice_vsi_add was renamed to
ice_vsi_init.

4) Enhance ice_vsi_setup for the upcoming SR-IOV changes and expose a
new wrapper function ice_pf_vsi_setup to create PF VSIs. Rework the
error handling path in ice_setup_pf_sw.

5) Introduce a new function ice_vsi_release_all to release all PF VSIs.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 43f8b224 09-Aug-2018 Bruce Allan <bruce.w.allan@intel.com>

ice: Change struct members from bool to u8

Recent versions of checkpatch have a new warning based on a documented
preference of Linus to not use bool in structures due to wasted space and
the size of bool is implementation dependent. For more information, see
the email thread at https://lkml.org/lkml/2017/11/21/384.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# f8ba7db8 09-Aug-2018 Jacob Keller <jacob.e.keller@intel.com>

ice: Report stats for allocated queues via ethtool stats

It is not safe to have the string table for statistics change order or
size over the lifetime of a given netdevice. This is because of the
nature of the 3-step process for obtaining stats. First, user space
performs a request for the size of the strings table. Second it performs
a separate request for the strings themselves, after allocating space
for the table. Third, it requests the stats themselves, also allocating
space for the table.

If the size decreased, there is potential to see garbage data or stats
values. In the worst case, we could potentially see stats values become
mis-aligned with their strings, so that it looks like a statistic is
being reported differently than it actually is.

Even worse, if the size increased, there is potential that the strings
table or stats table was not allocated large enough and the stats code
could access and write to memory it should not, potentially resulting in
undefined behavior and system crashes.

It isn't even safe if the size always changes under the RTNL lock. This
is because the calls take place over multiple user space commands, so it
is not possible to hold the RTNL lock for the entire duration of
obtaining strings and stats. Further, not all consumers of the ethtool
API are the user space ethtool program, and it is possible that one
assumes the strings will not change (valid under the current contract),
and thus only requests the stats values when requesting stats in a loop.

Finally, it's not possible in the general case to detect when the size
changes, because it is quite possible that one value which could impact
the stat size increased, while another decreased. This would result in
the same total number of stats, but reordering them so that stats no
longer line up with the strings they belong to. Since only size changes
aren't enough, we would need some sort of hash or token to determine
when the strings no longer match. This would require extending the
ethtool stats commands, but there is no more space in the relevant
structures.

The real solution to resolve this would be to add a completely new API
for stats, probably over netlink.

In the ice driver, the only thing impacting the stats that is not
constant is the number of queues. Instead of reporting stats for each
used queue, report stats for each allocated queue. We do not change the
number of queues allocated for a given netdevice, as we pass this into
the alloc_etherdev_mq() function to set the num_tx_queues and
num_rx_queues.

This resolves the potential bugs at the slight cost of displaying many
queue statistics which will not be activated.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# e94d4478 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Implement filter sync, NDO operations and bump version

This patch implements multiple pieces of functionality:

1. Added ice_vsi_sync_filters, which is called through the service task
to push filter updates to the hardware.

2. Add support to enable/disable promiscuous mode on an interface.
Enabling/disabling promiscuous mode on an interface results in
addition/removal of a promisc filter rule through ice_vsi_sync_filters.

3. Implement handlers for ndo_set_mac_address, ndo_change_mtu,
ndo_poll_controller and ndo_set_rx_mode.

This patch also marks the end of the driver addition by bumping up the
driver version.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 0b28b702 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Support link events, reset and rebuild

Link events are posted to a PF's admin receive queue (ARQ). This patch
adds the ability to detect and process link events.

This patch also adds the ability to process resets.

The driver can process the following resets:
1) EMP Reset (EMPR)
2) Global Reset (GLOBR)
3) Core Reset (CORER)
4) Physical Function Reset (PFR)

EMPR is the largest level of reset that the driver can handle. An EMPR
resets the manageability block and also the data path, including PHY and
link for all the PFs. The affected PFs are notified of this event through
a miscellaneous interrupt.

GLOBR is a subset of EMPR. It does everything EMPR does except that it
doesn't reset the manageability block.

CORER is a subset of GLOBR. It does everything GLOBR does but doesn't
reset PHY and link.

PFR is a subset of CORER and affects only the given physical function.
In other words, PFR can be thought of as a CORER for a single PF. Since
only the issuing PF is affected, a PFR doesn't result in the miscellaneous
interrupt being triggered.

All the resets have the following in common:
1) Tx/Rx is halted and all queues are stopped.
2) All the VSIs and filters programmed for the PF are lost and have to be
reprogrammed.
3) Control queue interfaces are reset and have to be reprogrammed.

In the rebuild flow, control queues are reinitialized, VSIs are reallocated
and filters are restored.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 5513b920 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Update Tx scheduler tree for VSI multi-Tx queue support

This patch adds the ability for a VSI to use multiple Tx queues. More
specifically, the patch
1) Provides the ability to update the Tx scheduler tree in the
firmware. The driver can configure the Tx scheduler tree by
adding/removing multiple Tx queues per TC per VSI.

2) Allows a VSI to reconfigure its Tx queues during runtime.

3) Synchronizes the Tx scheduler update operations using locks.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# fcea6f3d 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add stats and ethtool support

This patch implements a watchdog task to get packet statistics from
the device.

This patch also adds support for the following ethtool operations:

ethtool devname
ethtool -s devname [msglvl N] [msglevel type on|off]
ethtool -g|--show-ring devname
ethtool -G|--set-ring devname [rx N] [tx N]
ethtool -i|--driver devname
ethtool -d|--register-dump devname [raw on|off] [hex on|off] [file name]
ethtool -k|--show-features|--show-offload devname
ethtool -K|--features|--offload devname feature on|off
ethtool -P|--show-permaddr devname
ethtool -S|--statistics devname
ethtool -a|--show-pause devname
ethtool -A|--pause devname [autoneg on|off] [rx on|off] [tx on|off]
ethtool -r|--negotiate devname

CC: Andrew Lunn <andrew@lunn.ch>
CC: Jakub Kicinski <kubakici@wp.pl>
CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# d76a60ba 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add support for VLANs and offloads

This patch adds support for VLANs. When a VLAN is created a switch filter
is added to direct the VLAN traffic to the corresponding VSI. When a VLAN
is deleted, the filter is deleted as well.

This patch also adds support for the following hardware offloads.
1) VLAN tag insertion/stripping
2) Receive Side Scaling (RSS)
3) Tx checksum and TCP segmentation
4) Rx checksum

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 2b245cb2 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Implement transmit and NAPI support

This patch implements ice_start_xmit (the handler for ndo_start_xmit) and
related functions. ice_start_xmit ultimately calls ice_tx_map, where the
Tx descriptor is built and posted to the hardware by bumping the ring tail.

This patch also implements ice_napi_poll, which is invoked when there's an
interrupt on the VSI's queues. The interrupt can be due to either a
completed Tx or an Rx event. In case of a completed Tx/Rx event, resources
are reclaimed. Additionally, in case of an Rx event, the skb is fetched
and passed up to the network stack.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# cdedef59 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Configure VSIs for Tx/Rx

This patch configures the VSIs to be able to send and receive
packets by doing the following:

1) Initialize flexible parser to extract and include certain
fields in the Rx descriptor.

2) Add Tx queues by programming the Tx queue context (implemented in
ice_vsi_cfg_txqs). Note that adding the queues also enables (starts)
the queues.

3) Add Rx queues by programming Rx queue context (implemented in
ice_vsi_cfg_rxqs). Note that this only adds queues but doesn't start
them. The rings will be started by calling ice_vsi_start_rx_rings on
interface up.

4) Configure interrupts for VSI queues.

5) Implement ice_open and ice_stop.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 3a858ba3 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add support for VSI allocation and deallocation

This patch introduces data structures and functions to alloc/free
VSIs. The driver represents a VSI using the ice_vsi structure.

Some noteworthy points about VSI allocation:

1) A VSI is allocated in the firmware using the "add VSI" admin queue
command (implemented as ice_aq_add_vsi). The firmware returns an
identifier for the allocated VSI. The VSI context is used to program
certain aspects (loopback, queue map, etc.) of the VSI's configuration.

2) A VSI is deleted using the "free VSI" admin queue command (implemented
as ice_aq_free_vsi).

3) The driver represents a VSI using struct ice_vsi. This is allocated
and initialized as part of the ice_vsi_alloc flow, and deallocated
as part of the ice_vsi_delete flow.

4) Once the VSI is created, a netdev is allocated and associated with it.
The VSI's ring and vector related data structures are also allocated
and initialized.

5) A VSI's queues can either be contiguous or scattered. To do this, the
driver maintains a bitmap (vsi->avail_txqs) which is kept in sync with
the firmware's VSI queue allocation imap. If the VSI can't get a
contiguous queue allocation, it will fallback to scatter. This is
implemented in ice_vsi_get_qs which is called as part of the VSI setup
flow. In the release flow, the VSI's queues are released and the bitmap
is updated to reflect this by ice_vsi_put_qs.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 940b61af 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Initialize PF and setup miscellaneous interrupt

This patch continues the initialization flow as follows:

1) Allocate and initialize necessary fields (like vsi, num_alloc_vsi,
irq_tracker, etc) in the ice_pf instance.

2) Setup the miscellaneous interrupt handler. This also known as the
"other interrupt causes" (OIC) handler and is used to handle non
hotpath interrupts (like control queue events, link events,
exceptions, etc.

3) Implement a background task to process admin queue receive (ARQ)
events received by the driver.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# dc49c772 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Get MAC/PHY/link info and scheduler topology

This patch adds code to continue the initialization flow as follows:

1) Get PHY/link information and store it
2) Get default scheduler tree topology and store it
3) Get the MAC address associated with the port and store it

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 9c20346b 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Get switch config, scheduler config and device capabilities

This patch adds to the initialization flow by getting switch
configuration, scheduler configuration and device capabilities.

Switch configuration:
On boot, an L2 switch element is created in the firmware per physical
function. Each physical function is also mapped to a port, to which its
switch element is connected. In other words, this switch can be visualized
as an embedded vSwitch that can connect a physical function's virtual
station interfaces (VSIs) to the egress/ingress port. Egress/ingress
filters will be eventually created and applied on this switch element.
As part of the initialization flow, the driver gets configuration data
from this switch element and stores it.

Scheduler configuration:
The Tx scheduler is a subsystem responsible for setting and enforcing QoS.
As part of the initialization flow, the driver queries and stores the
default scheduler configuration for the given physical function.

Device capabilities:
As part of initialization, the driver has to determine what the device is
capable of (ex. max queues, VSIs, etc). This information is obtained from
the firmware and stored by the driver.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# f31e4b6f 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Start hardware initialization

This patch implements multiple pieces of the initialization flow
as follows:

1) A reset is issued to ensure a clean device state, followed
by initialization of admin queue interface.

2) Once the admin queue interface is up, clear the PF config
and transition the device to non-PXE mode.

3) Get the NVM configuration stored in the device's non-volatile
memory (NVM) using ice_init_nvm.

CC: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 7ec59eea 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add support for control queues

A control queue is a hardware interface which is used by the driver
to interact with other subsystems (like firmware, PHY, etc.). It is
implemented as a producer-consumer ring. More specifically, an
"admin queue" is a type of control queue used to interact with the
firmware.

This patch introduces data structures and functions to initialize
and teardown control/admin queues. Once the admin queue is initialized,
the driver uses it to get the firmware version.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 837f08fd 20-Mar-2018 Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice: Add basic driver framework for Intel(R) E800 Series

This patch adds a basic driver framework for the Intel(R) E800 Ethernet
Series of network devices. There is no functionality right now other than
the ability to load.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>