1.. SPDX-License-Identifier: GPL-2.0+
2
3=================================================================
4Linux Base Driver for the Intel(R) Ethernet Controller 800 Series
5=================================================================
6
7Intel ice Linux driver.
8Copyright(c) 2018-2021 Intel Corporation.
9
10Contents
11========
12
13- Overview
14- Identifying Your Adapter
15- Important Notes
16- Additional Features & Configurations
17- Performance Optimization
18
19
20The associated Virtual Function (VF) driver for this driver is iavf.
21
22Driver information can be obtained using ethtool and lspci.
23
24For questions related to hardware requirements, refer to the documentation
25supplied with your Intel adapter. All hardware requirements listed apply to use
26with Linux.
27
28This driver supports XDP (Express Data Path) and AF_XDP zero-copy. Note that
29XDP is blocked for frame sizes larger than 3KB.
30
31
32Identifying Your Adapter
33========================
34For information on how to identify your adapter, and for the latest Intel
35network drivers, refer to the Intel Support website:
36https://www.intel.com/support
37
38
39Important Notes
40===============
41
42Packet drops may occur under receive stress
43-------------------------------------------
44Devices based on the Intel(R) Ethernet Controller 800 Series are designed to
45tolerate a limited amount of system latency during PCIe and DMA transactions.
46If these transactions take longer than the tolerated latency, it can impact the
47length of time the packets are buffered in the device and associated memory,
48which may result in dropped packets. These packets drops typically do not have
49a noticeable impact on throughput and performance under standard workloads.
50
51If these packet drops appear to affect your workload, the following may improve
52the situation:
53
541) Make sure that your system's physical memory is in a high-performance
55   configuration, as recommended by the platform vendor. A common
56   recommendation is for all channels to be populated with a single DIMM
57   module.
582) In your system's BIOS/UEFI settings, select the "Performance" profile.
593) Your distribution may provide tools like "tuned," which can help tweak
60   kernel settings to achieve better standard settings for different workloads.
61
62
63Configuring SR-IOV for improved network security
64------------------------------------------------
65In a virtualized environment, on Intel(R) Ethernet Network Adapters that
66support SR-IOV, the virtual function (VF) may be subject to malicious behavior.
67Software-generated layer two frames, like IEEE 802.3x (link flow control), IEEE
68802.1Qbb (priority based flow-control), and others of this type, are not
69expected and can throttle traffic between the host and the virtual switch,
70reducing performance. To resolve this issue, and to ensure isolation from
71unintended traffic streams, configure all SR-IOV enabled ports for VLAN tagging
72from the administrative interface on the PF. This configuration allows
73unexpected, and potentially malicious, frames to be dropped.
74
75See "Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports" later in this
76README for configuration instructions.
77
78
79Do not unload port driver if VF with active VM is bound to it
80-------------------------------------------------------------
81Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
82Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
83Once the VM shuts down, or otherwise releases the VF, the command will
84complete.
85
86
87Additional Features and Configurations
88======================================
89
90ethtool
91-------
92The driver utilizes the ethtool interface for driver configuration and
93diagnostics, as well as displaying statistical information. The latest ethtool
94version is required for this functionality. Download it at:
95https://kernel.org/pub/software/network/ethtool/
96
97NOTE: The rx_bytes value of ethtool does not match the rx_bytes value of
98Netdev, due to the 4-byte CRC being stripped by the device. The difference
99between the two rx_bytes values will be 4 x the number of Rx packets. For
100example, if Rx packets are 10 and Netdev (software statistics) displays
101rx_bytes as "X", then ethtool (hardware statistics) will display rx_bytes as
102"X+40" (4 bytes CRC x 10 packets).
103
104
105Viewing Link Messages
106---------------------
107Link messages will not be displayed to the console if the distribution is
108restricting system messages. In order to see network driver link messages on
109your console, set dmesg to eight by entering the following::
110
111  # dmesg -n 8
112
113NOTE: This setting is not saved across reboots.
114
115
116Dynamic Device Personalization
117------------------------------
118Dynamic Device Personalization (DDP) allows you to change the packet processing
119pipeline of a device by applying a profile package to the device at runtime.
120Profiles can be used to, for example, add support for new protocols, change
121existing protocols, or change default settings. DDP profiles can also be rolled
122back without rebooting the system.
123
124The DDP package loads during device initialization. The driver looks for
125``intel/ice/ddp/ice.pkg`` in your firmware root (typically ``/lib/firmware/``
126or ``/lib/firmware/updates/``) and checks that it contains a valid DDP package
127file.
128
129NOTE: Your distribution should likely have provided the latest DDP file, but if
130ice.pkg is missing, you can find it in the linux-firmware repository or from
131intel.com.
132
133If the driver is unable to load the DDP package, the device will enter Safe
134Mode. Safe Mode disables advanced and performance features and supports only
135basic traffic and minimal functionality, such as updating the NVM or
136downloading a new driver or DDP package. Safe Mode only applies to the affected
137physical function and does not impact any other PFs. See the "Intel(R) Ethernet
138Adapters and Devices User Guide" for more details on DDP and Safe Mode.
139
140NOTES:
141
142- If you encounter issues with the DDP package file, you may need to download
143  an updated driver or DDP package file. See the log messages for more
144  information.
145
146- The ice.pkg file is a symbolic link to the default DDP package file.
147
148- You cannot update the DDP package if any PF drivers are already loaded. To
149  overwrite a package, unload all PFs and then reload the driver with the new
150  package.
151
152- Only the first loaded PF per device can download a package for that device.
153
154You can install specific DDP package files for different physical devices in
155the same system. To install a specific DDP package file:
156
1571. Download the DDP package file you want for your device.
158
1592. Rename the file ice-xxxxxxxxxxxxxxxx.pkg, where 'xxxxxxxxxxxxxxxx' is the
160   unique 64-bit PCI Express device serial number (in hex) of the device you
161   want the package downloaded on. The filename must include the complete
162   serial number (including leading zeros) and be all lowercase. For example,
163   if the 64-bit serial number is b887a3ffffca0568, then the file name would be
164   ice-b887a3ffffca0568.pkg.
165
166   To find the serial number from the PCI bus address, you can use the
167   following command::
168
169     # lspci -vv -s af:00.0 | grep -i Serial
170     Capabilities: [150 v1] Device Serial Number b8-87-a3-ff-ff-ca-05-68
171
172   You can use the following command to format the serial number without the
173   dashes::
174
175     # lspci -vv -s af:00.0 | grep -i Serial | awk '{print $7}' | sed s/-//g
176     b887a3ffffca0568
177
1783. Copy the renamed DDP package file to
179   ``/lib/firmware/updates/intel/ice/ddp/``. If the directory does not yet
180   exist, create it before copying the file.
181
1824. Unload all of the PFs on the device.
183
1845. Reload the driver with the new package.
185
186NOTE: The presence of a device-specific DDP package file overrides the loading
187of the default DDP package file (ice.pkg).
188
189
190Intel(R) Ethernet Flow Director
191-------------------------------
192The Intel Ethernet Flow Director performs the following tasks:
193
194- Directs receive packets according to their flows to different queues
195- Enables tight control on routing a flow in the platform
196- Matches flows and CPU cores for flow affinity
197
198NOTE: This driver supports the following flow types:
199
200- IPv4
201- TCPv4
202- UDPv4
203- SCTPv4
204- IPv6
205- TCPv6
206- UDPv6
207- SCTPv6
208
209Each flow type supports valid combinations of IP addresses (source or
210destination) and UDP/TCP/SCTP ports (source and destination). You can supply
211only a source IP address, a source IP address and a destination port, or any
212combination of one or more of these four parameters.
213
214NOTE: This driver allows you to filter traffic based on a user-defined flexible
215two-byte pattern and offset by using the ethtool user-def and mask fields. Only
216L3 and L4 flow types are supported for user-defined flexible filters. For a
217given flow type, you must clear all Intel Ethernet Flow Director filters before
218changing the input set (for that flow type).
219
220
221Flow Director Filters
222---------------------
223Flow Director filters are used to direct traffic that matches specified
224characteristics. They are enabled through ethtool's ntuple interface. To enable
225or disable the Intel Ethernet Flow Director and these filters::
226
227  # ethtool -K <ethX> ntuple <off|on>
228
229NOTE: When you disable ntuple filters, all the user programmed filters are
230flushed from the driver cache and hardware. All needed filters must be re-added
231when ntuple is re-enabled.
232
233To display all of the active filters::
234
235  # ethtool -u <ethX>
236
237To add a new filter::
238
239  # ethtool -U <ethX> flow-type <type> src-ip <ip> [m <ip_mask>] dst-ip <ip>
240  [m <ip_mask>] src-port <port> [m <port_mask>] dst-port <port> [m <port_mask>]
241  action <queue>
242
243  Where:
244    <ethX> - the Ethernet device to program
245    <type> - can be ip4, tcp4, udp4, sctp4, ip6, tcp6, udp6, sctp6
246    <ip> - the IP address to match on
247    <ip_mask> - the IPv4 address to mask on
248              NOTE: These filters use inverted masks.
249    <port> - the port number to match on
250    <port_mask> - the 16-bit integer for masking
251              NOTE: These filters use inverted masks.
252    <queue> - the queue to direct traffic toward (-1 discards the
253              matched traffic)
254
255To delete a filter::
256
257  # ethtool -U <ethX> delete <N>
258
259  Where <N> is the filter ID displayed when printing all the active filters,
260  and may also have been specified using "loc <N>" when adding the filter.
261
262EXAMPLES:
263
264To add a filter that directs packet to queue 2::
265
266  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \
267  192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
268
269To set a filter using only the source and destination IP address::
270
271  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \
272  192.168.10.2 action 2 [loc 1]
273
274To set a filter based on a user-defined pattern and offset::
275
276  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \
277  192.168.10.2 user-def 0x4FFFF action 2 [loc 1]
278
279  where the value of the user-def field contains the offset (4 bytes) and
280  the pattern (0xffff).
281
282To match TCP traffic sent from 192.168.0.1, port 5300, directed to 192.168.0.5,
283port 80, and then send it to queue 7::
284
285  # ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5
286  src-port 5300 dst-port 80 action 7
287
288To add a TCPv4 filter with a partial mask for a source IP subnet::
289
290  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.0.0 m 0.255.255.255 dst-ip
291  192.168.5.12 src-port 12600 dst-port 31 action 12
292
293NOTES:
294
295For each flow-type, the programmed filters must all have the same matching
296input set. For example, issuing the following two commands is acceptable::
297
298  # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
299  # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
300
301Issuing the next two commands, however, is not acceptable, since the first
302specifies src-ip and the second specifies dst-ip::
303
304  # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
305  # ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
306
307The second command will fail with an error. You may program multiple filters
308with the same fields, using different values, but, on one device, you may not
309program two tcp4 filters with different matching fields.
310
311The ice driver does not support matching on a subportion of a field, thus
312partial mask fields are not supported.
313
314
315Flex Byte Flow Director Filters
316-------------------------------
317The driver also supports matching user-defined data within the packet payload.
318This flexible data is specified using the "user-def" field of the ethtool
319command in the following way:
320
321.. table::
322
323    ============================== ============================
324    ``31    28    24    20    16`` ``15    12    8    4    0``
325    ``offset into packet payload`` ``2 bytes of flexible data``
326    ============================== ============================
327
328For example,
329
330::
331
332  ... user-def 0x4FFFF ...
333
334tells the filter to look 4 bytes into the payload and match that value against
3350xFFFF. The offset is based on the beginning of the payload, and not the
336beginning of the packet. Thus
337
338::
339
340  flow-type tcp4 ... user-def 0x8BEAF ...
341
342would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the
343TCP/IPv4 payload.
344
345Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload.
346Thus to match the first byte of the payload, you must actually add 4 bytes to
347the offset. Also note that ip4 filters match both ICMP frames as well as raw
348(unknown) ip4 frames, where the payload will be the L3 payload of the IP4
349frame.
350
351The maximum offset is 64. The hardware will only read up to 64 bytes of data
352from the payload. The offset must be even because the flexible data is 2 bytes
353long and must be aligned to byte 0 of the packet payload.
354
355The user-defined flexible offset is also considered part of the input set and
356cannot be programmed separately for multiple filters of the same type. However,
357the flexible data is not part of the input set and multiple filters may use the
358same offset but match against different data.
359
360
361RSS Hash Flow
362-------------
363Allows you to set the hash bytes per flow type and any combination of one or
364more options for Receive Side Scaling (RSS) hash byte configuration.
365
366::
367
368  # ethtool -N <ethX> rx-flow-hash <type> <option>
369
370  Where <type> is:
371    tcp4    signifying TCP over IPv4
372    udp4    signifying UDP over IPv4
373    gtpc4   signifying GTP-C over IPv4
374    gtpc4t  signifying GTP-C (include TEID) over IPv4
375    gtpu4   signifying GTP-U over IPV4
376    gtpu4e  signifying GTP-U and Extension Header over IPV4
377    gtpu4u  signifying GTP-U PSC Uplink over IPV4
378    gtpu4d  signifying GTP-U PSC Downlink over IPV4
379    tcp6    signifying TCP over IPv6
380    udp6    signifying UDP over IPv6
381    gtpc6   signifying GTP-C over IPv6
382    gtpc6t  signifying GTP-C (include TEID) over IPv6
383    gtpu6   signifying GTP-U over IPV6
384    gtpu6e  signifying GTP-U and Extension Header over IPV6
385    gtpu6u  signifying GTP-U PSC Uplink over IPV6
386    gtpu6d  signifying GTP-U PSC Downlink over IPV6
387  And <option> is one or more of:
388    s     Hash on the IP source address of the Rx packet.
389    d     Hash on the IP destination address of the Rx packet.
390    f     Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
391    n     Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
392    e     Hash on GTP Packet on TEID (4bytes) of the Rx packet.
393
394
395Accelerated Receive Flow Steering (aRFS)
396----------------------------------------
397Devices based on the Intel(R) Ethernet Controller 800 Series support
398Accelerated Receive Flow Steering (aRFS) on the PF. aRFS is a load-balancing
399mechanism that allows you to direct packets to the same CPU where an
400application is running or consuming the packets in that flow.
401
402NOTES:
403
404- aRFS requires that ntuple filtering is enabled via ethtool.
405- aRFS support is limited to the following packet types:
406
407    - TCP over IPv4 and IPv6
408    - UDP over IPv4 and IPv6
409    - Nonfragmented packets
410
411- aRFS only supports Flow Director filters, which consist of the
412  source/destination IP addresses and source/destination ports.
413- aRFS and ethtool's ntuple interface both use the device's Flow Director. aRFS
414  and ntuple features can coexist, but you may encounter unexpected results if
415  there's a conflict between aRFS and ntuple requests. See "Intel(R) Ethernet
416  Flow Director" for additional information.
417
418To set up aRFS:
419
4201. Enable the Intel Ethernet Flow Director and ntuple filters using ethtool.
421
422::
423
424   # ethtool -K <ethX> ntuple on
425
4262. Set up the number of entries in the global flow table. For example:
427
428::
429
430   # NUM_RPS_ENTRIES=16384
431   # echo $NUM_RPS_ENTRIES > /proc/sys/net/core/rps_sock_flow_entries
432
4333. Set up the number of entries in the per-queue flow table. For example:
434
435::
436
437   # NUM_RX_QUEUES=64
438   # for file in /sys/class/net/$IFACE/queues/rx-*/rps_flow_cnt; do
439   # echo $(($NUM_RPS_ENTRIES/$NUM_RX_QUEUES)) > $file;
440   # done
441
4424. Disable the IRQ balance daemon (this is only a temporary stop of the service
443   until the next reboot).
444
445::
446
447   # systemctl stop irqbalance
448
4495. Configure the interrupt affinity.
450
451   See ``/Documentation/core-api/irq/irq-affinity.rst``
452
453
454To disable aRFS using ethtool::
455
456  # ethtool -K <ethX> ntuple off
457
458NOTE: This command will disable ntuple filters and clear any aRFS filters in
459software and hardware.
460
461Example Use Case:
462
4631. Set the server application on the desired CPU (e.g., CPU 4).
464
465::
466
467   # taskset -c 4 netserver
468
4692. Use netperf to route traffic from the client to CPU 4 on the server with
470   aRFS configured. This example uses TCP over IPv4.
471
472::
473
474   # netperf -H <Host IPv4 Address> -t TCP_STREAM
475
476
477Enabling Virtual Functions (VFs)
478--------------------------------
479Use sysfs to enable virtual functions (VF).
480
481For example, you can create 4 VFs as follows::
482
483  # echo 4 > /sys/class/net/<ethX>/device/sriov_numvfs
484
485To disable VFs, write 0 to the same file::
486
487  # echo 0 > /sys/class/net/<ethX>/device/sriov_numvfs
488
489The maximum number of VFs for the ice driver is 256 total (all ports). To check
490how many VFs each PF supports, use the following command::
491
492  # cat /sys/class/net/<ethX>/device/sriov_totalvfs
493
494Note: You cannot use SR-IOV when link aggregation (LAG)/bonding is active, and
495vice versa. To enforce this, the driver checks for this mutual exclusion.
496
497
498Displaying VF Statistics on the PF
499----------------------------------
500Use the following command to display the statistics for the PF and its VFs::
501
502  # ip -s link show dev <ethX>
503
504NOTE: The output of this command can be very large due to the maximum number of
505possible VFs.
506
507The PF driver will display a subset of the statistics for the PF and for all
508VFs that are configured. The PF will always print a statistics block for each
509of the possible VFs, and it will show zero for all unconfigured VFs.
510
511
512Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports
513--------------------------------------------------------
514To configure VLAN tagging for the ports on an SR-IOV enabled adapter, use the
515following command. The VLAN configuration should be done before the VF driver
516is loaded or the VM is booted. The VF is not aware of the VLAN tag being
517inserted on transmit and removed on received frames (sometimes called "port
518VLAN" mode).
519
520::
521
522  # ip link set dev <ethX> vf <id> vlan <vlan id>
523
524For example, the following will configure PF eth0 and the first VF on VLAN 10::
525
526  # ip link set dev eth0 vf 0 vlan 10
527
528
529Enabling a VF link if the port is disconnected
530----------------------------------------------
531If the physical function (PF) link is down, you can force link up (from the
532host PF) on any virtual functions (VF) bound to the PF.
533
534For example, to force link up on VF 0 bound to PF eth0::
535
536  # ip link set eth0 vf 0 state enable
537
538Note: If the command does not work, it may not be supported by your system.
539
540
541Setting the MAC Address for a VF
542--------------------------------
543To change the MAC address for the specified VF::
544
545  # ip link set <ethX> vf 0 mac <address>
546
547For example::
548
549  # ip link set <ethX> vf 0 mac 00:01:02:03:04:05
550
551This setting lasts until the PF is reloaded.
552
553NOTE: Assigning a MAC address for a VF from the host will disable any
554subsequent requests to change the MAC address from within the VM. This is a
555security feature. The VM is not aware of this restriction, so if this is
556attempted in the VM, it will trigger MDD events.
557
558
559Trusted VFs and VF Promiscuous Mode
560-----------------------------------
561This feature allows you to designate a particular VF as trusted and allows that
562trusted VF to request selective promiscuous mode on the Physical Function (PF).
563
564To set a VF as trusted or untrusted, enter the following command in the
565Hypervisor::
566
567  # ip link set dev <ethX> vf 1 trust [on|off]
568
569NOTE: It's important to set the VF to trusted before setting promiscuous mode.
570If the VM is not trusted, the PF will ignore promiscuous mode requests from the
571VF. If the VM becomes trusted after the VF driver is loaded, you must make a
572new request to set the VF to promiscuous.
573
574Once the VF is designated as trusted, use the following commands in the VM to
575set the VF to promiscuous mode.
576
577For promiscuous all::
578
579  # ip link set <ethX> promisc on
580  Where <ethX> is a VF interface in the VM
581
582For promiscuous Multicast::
583
584  # ip link set <ethX> allmulticast on
585  Where <ethX> is a VF interface in the VM
586
587NOTE: By default, the ethtool private flag vf-true-promisc-support is set to
588"off," meaning that promiscuous mode for the VF will be limited. To set the
589promiscuous mode for the VF to true promiscuous and allow the VF to see all
590ingress traffic, use the following command::
591
592  # ethtool --set-priv-flags <ethX> vf-true-promisc-support on
593
594The vf-true-promisc-support private flag does not enable promiscuous mode;
595rather, it designates which type of promiscuous mode (limited or true) you will
596get when you enable promiscuous mode using the ip link commands above. Note
597that this is a global setting that affects the entire device. However, the
598vf-true-promisc-support private flag is only exposed to the first PF of the
599device. The PF remains in limited promiscuous mode regardless of the
600vf-true-promisc-support setting.
601
602Next, add a VLAN interface on the VF interface. For example::
603
604  # ip link add link eth2 name eth2.100 type vlan id 100
605
606Note that the order in which you set the VF to promiscuous mode and add the
607VLAN interface does not matter (you can do either first). The result in this
608example is that the VF will get all traffic that is tagged with VLAN 100.
609
610
611Malicious Driver Detection (MDD) for VFs
612----------------------------------------
613Some Intel Ethernet devices use Malicious Driver Detection (MDD) to detect
614malicious traffic from the VF and disable Tx/Rx queues or drop the offending
615packet until a VF driver reset occurs. You can view MDD messages in the PF's
616system log using the dmesg command.
617
618- If the PF driver logs MDD events from the VF, confirm that the correct VF
619  driver is installed.
620- To restore functionality, you can manually reload the VF or VM or enable
621  automatic VF resets.
622- When automatic VF resets are enabled, the PF driver will immediately reset
623  the VF and reenable queues when it detects MDD events on the receive path.
624- If automatic VF resets are disabled, the PF will not automatically reset the
625  VF when it detects MDD events.
626
627To enable or disable automatic VF resets, use the following command::
628
629  # ethtool --set-priv-flags <ethX> mdd-auto-reset-vf on|off
630
631
632MAC and VLAN Anti-Spoofing Feature for VFs
633------------------------------------------
634When a malicious driver on a Virtual Function (VF) interface attempts to send a
635spoofed packet, it is dropped by the hardware and not transmitted.
636
637NOTE: This feature can be disabled for a specific VF::
638
639  # ip link set <ethX> vf <vf id> spoofchk {off|on}
640
641
642Jumbo Frames
643------------
644Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
645to a value larger than the default value of 1500.
646
647Use the ifconfig command to increase the MTU size. For example, enter the
648following where <ethX> is the interface number::
649
650  # ifconfig <ethX> mtu 9000 up
651
652Alternatively, you can use the ip command as follows::
653
654  # ip link set mtu 9000 dev <ethX>
655  # ip link set up dev <ethX>
656
657This setting is not saved across reboots.
658
659
660NOTE: The maximum MTU setting for jumbo frames is 9702. This corresponds to the
661maximum jumbo frame size of 9728 bytes.
662
663NOTE: This driver will attempt to use multiple page sized buffers to receive
664each jumbo packet. This should help to avoid buffer starvation issues when
665allocating receive packets.
666
667NOTE: Packet loss may have a greater impact on throughput when you use jumbo
668frames. If you observe a drop in performance after enabling jumbo frames,
669enabling flow control may mitigate the issue.
670
671
672Speed and Duplex Configuration
673------------------------------
674In addressing speed and duplex configuration issues, you need to distinguish
675between copper-based adapters and fiber-based adapters.
676
677In the default mode, an Intel(R) Ethernet Network Adapter using copper
678connections will attempt to auto-negotiate with its link partner to determine
679the best setting. If the adapter cannot establish link with the link partner
680using auto-negotiation, you may need to manually configure the adapter and link
681partner to identical settings to establish link and pass packets. This should
682only be needed when attempting to link with an older switch that does not
683support auto-negotiation or one that has been forced to a specific speed or
684duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
685and higher cannot be forced. Use the autonegotiation advertising setting to
686manually set devices for 1 Gbps and higher.
687
688Speed, duplex, and autonegotiation advertising are configured through the
689ethtool utility. For the latest version, download and install ethtool from the
690following website:
691
692   https://kernel.org/pub/software/network/ethtool/
693
694To see the speed configurations your device supports, run the following::
695
696  # ethtool <ethX>
697
698Caution: Only experienced network administrators should force speed and duplex
699or change autonegotiation advertising manually. The settings at the switch must
700always match the adapter settings. Adapter performance may suffer or your
701adapter may not operate if you configure the adapter differently from your
702switch.
703
704
705Data Center Bridging (DCB)
706--------------------------
707NOTE: The kernel assumes that TC0 is available, and will disable Priority Flow
708Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
709enabled when setting up DCB on your switch.
710
711DCB is a configuration Quality of Service implementation in hardware. It uses
712the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
713different priorities that traffic can be filtered into. It also enables
714priority flow control (802.1Qbb) which can limit or eliminate the number of
715dropped packets during network stress. Bandwidth can be allocated to each of
716these priorities, which is enforced at the hardware level (802.1Qaz).
717
718DCB is normally configured on the network using the DCBX protocol (802.1Qaz), a
719specialization of LLDP (802.1AB). The ice driver supports the following
720mutually exclusive variants of DCBX support:
721
7221) Firmware-based LLDP Agent
7232) Software-based LLDP Agent
724
725In firmware-based mode, firmware intercepts all LLDP traffic and handles DCBX
726negotiation transparently for the user. In this mode, the adapter operates in
727"willing" DCBX mode, receiving DCB settings from the link partner (typically a
728switch). The local user can only query the negotiated DCB configuration. For
729information on configuring DCBX parameters on a switch, please consult the
730switch manufacturer's documentation.
731
732In software-based mode, LLDP traffic is forwarded to the network stack and user
733space, where a software agent can handle it. In this mode, the adapter can
734operate in either "willing" or "nonwilling" DCBX mode and DCB configuration can
735be both queried and set locally. This mode requires the FW-based LLDP Agent to
736be disabled.
737
738NOTE:
739
740- You can enable and disable the firmware-based LLDP Agent using an ethtool
741  private flag. Refer to the "FW-LLDP (Firmware Link Layer Discovery Protocol)"
742  section in this README for more information.
743- In software-based DCBX mode, you can configure DCB parameters using software
744  LLDP/DCBX agents that interface with the Linux kernel's DCB Netlink API. We
745  recommend using OpenLLDP as the DCBX agent when running in software mode. For
746  more information, see the OpenLLDP man pages and
747  https://github.com/intel/openlldp.
748- The driver implements the DCB netlink interface layer to allow the user space
749  to communicate with the driver and query DCB configuration for the port.
750- iSCSI with DCB is not supported.
751
752
753FW-LLDP (Firmware Link Layer Discovery Protocol)
754------------------------------------------------
755Use ethtool to change FW-LLDP settings. The FW-LLDP setting is per port and
756persists across boots.
757
758To enable LLDP::
759
760  # ethtool --set-priv-flags <ethX> fw-lldp-agent on
761
762To disable LLDP::
763
764  # ethtool --set-priv-flags <ethX> fw-lldp-agent off
765
766To check the current LLDP setting::
767
768  # ethtool --show-priv-flags <ethX>
769
770NOTE: You must enable the UEFI HII "LLDP Agent" attribute for this setting to
771take effect. If "LLDP AGENT" is set to disabled, you cannot enable it from the
772OS.
773
774
775Flow Control
776------------
777Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
778receiving and transmitting pause frames for ice. When transmit is enabled,
779pause frames are generated when the receive packet buffer crosses a predefined
780threshold. When receive is enabled, the transmit unit will halt for the time
781delay specified when a pause frame is received.
782
783NOTE: You must have a flow control capable link partner.
784
785Flow Control is disabled by default.
786
787Use ethtool to change the flow control settings.
788
789To enable or disable Rx or Tx Flow Control::
790
791  # ethtool -A <ethX> rx <on|off> tx <on|off>
792
793Note: This command only enables or disables Flow Control if auto-negotiation is
794disabled. If auto-negotiation is enabled, this command changes the parameters
795used for auto-negotiation with the link partner.
796
797Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
798on your device, you may not be able to change the auto-negotiation setting.
799
800NOTE:
801
802- The ice driver requires flow control on both the port and link partner. If
803  flow control is disabled on one of the sides, the port may appear to hang on
804  heavy traffic.
805- You may encounter issues with link-level flow control (LFC) after disabling
806  DCB. The LFC status may show as enabled but traffic is not paused. To resolve
807  this issue, disable and reenable LFC using ethtool::
808
809   # ethtool -A <ethX> rx off tx off
810   # ethtool -A <ethX> rx on tx on
811
812
813NAPI
814----
815
816This driver supports NAPI (Rx polling mode).
817
818See :ref:`Documentation/networking/napi.rst <napi>` for more information.
819
820MACVLAN
821-------
822This driver supports MACVLAN. Kernel support for MACVLAN can be tested by
823checking if the MACVLAN driver is loaded. You can run 'lsmod | grep macvlan' to
824see if the MACVLAN driver is loaded or run 'modprobe macvlan' to try to load
825the MACVLAN driver.
826
827NOTE:
828
829- In passthru mode, you can only set up one MACVLAN device. It will inherit the
830  MAC address of the underlying PF (Physical Function) device.
831
832
833IEEE 802.1ad (QinQ) Support
834---------------------------
835The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
836IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
837"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
838allow L2 tunneling and the ability to segregate traffic within a particular
839VLAN ID, among other uses.
840
841NOTES:
842
843- Receive checksum offloads and VLAN acceleration are not supported for 802.1ad
844  (QinQ) packets.
845
846- 0x88A8 traffic will not be received unless VLAN stripping is disabled with
847  the following command::
848
849    # ethtool -K <ethX> rxvlan off
850
851- 0x88A8/0x8100 double VLANs cannot be used with 0x8100 or 0x8100/0x8100 VLANS
852  configured on the same port. 0x88a8/0x8100 traffic will not be received if
853  0x8100 VLANs are configured.
854
855- The VF can only transmit 0x88A8/0x8100 (i.e., 802.1ad/802.1Q) traffic if:
856
857    1) The VF is not assigned a port VLAN.
858    2) spoofchk is disabled from the PF. If you enable spoofchk, the VF will
859       not transmit 0x88A8/0x8100 traffic.
860
861- The VF may not receive all network traffic based on the Inner VLAN header
862  when VF true promiscuous mode (vf-true-promisc-support) and double VLANs are
863  enabled in SR-IOV mode.
864
865The following are examples of how to configure 802.1ad (QinQ)::
866
867  # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
868  # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
869
870  Where "24" and "371" are example VLAN IDs.
871
872
873Tunnel/Overlay Stateless Offloads
874---------------------------------
875Supported tunnels and overlays include VXLAN, GENEVE, and others depending on
876hardware and software configuration. Stateless offloads are enabled by default.
877
878To view the current state of all offloads::
879
880  # ethtool -k <ethX>
881
882
883UDP Segmentation Offload
884------------------------
885Allows the adapter to offload transmit segmentation of UDP packets with
886payloads up to 64K into valid Ethernet frames. Because the adapter hardware is
887able to complete data segmentation much faster than operating system software,
888this feature may improve transmission performance.
889In addition, the adapter may use fewer CPU resources.
890
891NOTE:
892
893- The application sending UDP packets must support UDP segmentation offload.
894
895To enable/disable UDP Segmentation Offload, issue the following command::
896
897  # ethtool -K <ethX> tx-udp-segmentation [off|on]
898
899
900GNSS module
901-----------
902Requires kernel compiled with CONFIG_GNSS=y or CONFIG_GNSS=m.
903Allows user to read messages from the GNSS hardware module and write supported
904commands. If the module is physically present, a GNSS device is spawned:
905``/dev/gnss<id>``.
906The protocol of write command is dependent on the GNSS hardware module as the
907driver writes raw bytes by the GNSS object to the receiver through i2c. Please
908refer to the hardware GNSS module documentation for configuration details.
909
910
911Firmware (FW) logging
912---------------------
913The driver supports FW logging via the debugfs interface on PF 0 only. The FW
914running on the NIC must support FW logging; if the FW doesn't support FW logging
915the 'fwlog' file will not get created in the ice debugfs directory.
916
917Module configuration
918~~~~~~~~~~~~~~~~~~~~
919Firmware logging is configured on a per module basis. Each module can be set to
920a value independent of the other modules (unless the module 'all' is specified).
921The modules will be instantiated under the 'fwlog/modules' directory.
922
923The user can set the log level for a module by writing to the module file like
924this::
925
926  # echo <log_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/<module>
927
928where
929
930* log_level is a name as described below. Each level includes the
931  messages from the previous/lower level
932
933      *	none
934      *	error
935      *	warning
936      *	normal
937      *	verbose
938
939* module is a name that represents the module to receive events for. The
940  module names are
941
942      *	general
943      *	ctrl
944      *	link
945      *	link_topo
946      *	dnl
947      *	i2c
948      *	sdp
949      *	mdio
950      *	adminq
951      *	hdma
952      *	lldp
953      *	dcbx
954      *	dcb
955      *	xlr
956      *	nvm
957      *	auth
958      *	vpd
959      *	iosf
960      *	parser
961      *	sw
962      *	scheduler
963      *	txq
964      *	rsvd
965      *	post
966      *	watchdog
967      *	task_dispatch
968      *	mng
969      *	synce
970      *	health
971      *	tsdrv
972      *	pfreg
973      *	mdlver
974      *	all
975
976The name 'all' is special and allows the user to set all of the modules to the
977specified log_level or to read the log_level of all of the modules.
978
979Example usage to configure the modules
980^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
981
982To set a single module to 'verbose'::
983
984  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/link
985
986To set multiple modules then issue the command multiple times::
987
988  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/link
989  # echo warning > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/ctrl
990  # echo none > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb
991
992To set all the modules to the same value::
993
994  # echo normal > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all
995
996To read the log_level of a specific module (e.g. module 'general')::
997
998  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/general
999
1000To read the log_level of all the modules::
1001
1002  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all
1003
1004Enabling FW log
1005~~~~~~~~~~~~~~~
1006Configuring the modules indicates to the FW that the configured modules should
1007generate events that the driver is interested in, but it **does not** send the
1008events to the driver until the enable message is sent to the FW. To do this
1009the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An example
1010is::
1011
1012  # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable
1013
1014Retrieving FW log data
1015~~~~~~~~~~~~~~~~~~~~~~
1016The FW log data can be retrieved by reading from 'fwlog/data'. The user can
1017write any value to 'fwlog/data' to clear the data. The data can only be cleared
1018when FW logging is disabled. The FW log data is a binary file that is sent to
1019Intel and used to help debug user issues.
1020
1021An example to read the data is::
1022
1023  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin
1024
1025An example to clear the data is::
1026
1027  # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data
1028
1029Changing how often the log events are sent to the driver
1030~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1031The driver receives FW log data from the Admin Receive Queue (ARQ). The
1032frequency that the FW sends the ARQ events can be configured by writing to
1033'fwlog/nr_messages'. The range is 1-128 (1 means push every log message, 128
1034means push only when the max AQ command buffer is full). The suggested value is
103510. The user can see what the value is configured to by reading
1036'fwlog/nr_messages'. An example to set the value is::
1037
1038  # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_messages
1039
1040Configuring the amount of memory used to store FW log data
1041~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1042The driver stores FW log data within the driver. The default size of the memory
1043used to store the data is 1MB. Some use cases may require more or less data so
1044the user can change the amount of memory that is allocated for FW log data.
1045To change the amount of memory then write to 'fwlog/log_size'. The value must be
1046one of: 128K, 256K, 512K, 1M, or 2M. FW logging must be disabled to change the
1047value. An example of changing the value is::
1048
1049  # echo 128K > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/log_size
1050
1051
1052Performance Optimization
1053========================
1054Driver defaults are meant to fit a wide variety of workloads, but if further
1055optimization is required, we recommend experimenting with the following
1056settings.
1057
1058
1059Rx Descriptor Ring Size
1060-----------------------
1061To reduce the number of Rx packet discards, increase the number of Rx
1062descriptors for each Rx ring using ethtool.
1063
1064  Check if the interface is dropping Rx packets due to buffers being full
1065  (rx_dropped.nic can mean that there is no PCIe bandwidth)::
1066
1067    # ethtool -S <ethX> | grep "rx_dropped"
1068
1069  If the previous command shows drops on queues, it may help to increase
1070  the number of descriptors using 'ethtool -G'::
1071
1072    # ethtool -G <ethX> rx <N>
1073    Where <N> is the desired number of ring entries/descriptors
1074
1075  This can provide temporary buffering for issues that create latency while
1076  the CPUs process descriptors.
1077
1078
1079Interrupt Rate Limiting
1080-----------------------
1081This driver supports an adaptive interrupt throttle rate (ITR) mechanism that
1082is tuned for general workloads. The user can customize the interrupt rate
1083control for specific workloads, via ethtool, adjusting the number of
1084microseconds between interrupts.
1085
1086To set the interrupt rate manually, you must disable adaptive mode::
1087
1088  # ethtool -C <ethX> adaptive-rx off adaptive-tx off
1089
1090For lower CPU utilization:
1091
1092  Disable adaptive ITR and lower Rx and Tx interrupts. The examples below
1093  affect every queue of the specified interface.
1094
1095  Setting rx-usecs and tx-usecs to 80 will limit interrupts to about
1096  12,500 interrupts per second per queue::
1097
1098    # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs 80 tx-usecs 80
1099
1100For reduced latency:
1101
1102  Disable adaptive ITR and ITR by setting rx-usecs and tx-usecs to 0
1103  using ethtool::
1104
1105    # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
1106
1107Per-queue interrupt rate settings:
1108
1109  The following examples are for queues 1 and 3, but you can adjust other
1110  queues.
1111
1112  To disable Rx adaptive ITR and set static Rx ITR to 10 microseconds or
1113  about 100,000 interrupts/second, for queues 1 and 3::
1114
1115    # ethtool --per-queue <ethX> queue_mask 0xa --coalesce adaptive-rx off
1116    rx-usecs 10
1117
1118  To show the current coalesce settings for queues 1 and 3::
1119
1120    # ethtool --per-queue <ethX> queue_mask 0xa --show-coalesce
1121
1122Bounding interrupt rates using rx-usecs-high:
1123
1124  :Valid Range: 0-236 (0=no limit)
1125
1126   The range of 0-236 microseconds provides an effective range of 4,237 to
1127   250,000 interrupts per second. The value of rx-usecs-high can be set
1128   independently of rx-usecs and tx-usecs in the same ethtool command, and is
1129   also independent of the adaptive interrupt moderation algorithm. The
1130   underlying hardware supports granularity in 4-microsecond intervals, so
1131   adjacent values may result in the same interrupt rate.
1132
1133  The following command would disable adaptive interrupt moderation, and allow
1134  a maximum of 5 microseconds before indicating a receive or transmit was
1135  complete. However, instead of resulting in as many as 200,000 interrupts per
1136  second, it limits total interrupts per second to 50,000 via the rx-usecs-high
1137  parameter.
1138
1139  ::
1140
1141    # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs-high 20
1142    rx-usecs 5 tx-usecs 5
1143
1144
1145Virtualized Environments
1146------------------------
1147In addition to the other suggestions in this section, the following may be
1148helpful to optimize performance in VMs.
1149
1150  Using the appropriate mechanism (vcpupin) in the VM, pin the CPUs to
1151  individual LCPUs, making sure to use a set of CPUs included in the
1152  device's local_cpulist: ``/sys/class/net/<ethX>/device/local_cpulist``.
1153
1154  Configure as many Rx/Tx queues in the VM as available. (See the iavf driver
1155  documentation for the number of queues supported.) For example::
1156
1157    # ethtool -L <virt_interface> rx <max> tx <max>
1158
1159
1160Support
1161=======
1162For general information, go to the Intel support website at:
1163https://www.intel.com/support/
1164
1165If an issue is identified with the released source code on a supported kernel
1166with a supported adapter, email the specific information related to the issue
1167to intel-wired-lan@lists.osuosl.org.
1168
1169
1170Trademarks
1171==========
1172Intel is a trademark or registered trademark of Intel Corporation or its
1173subsidiaries in the United States and/or other countries.
1174
1175* Other names and brands may be claimed as the property of others.
1176