History log of /freebsd-current/sys/dev/ena/ena.h
Revision Date Author Comments
# 4e2688cc 30-Oct-2023 Osama Abboud <osamaabb@amazon.com>

ena: Update driver version to v2.7.0

Features:
* Introduce customer and SRD metrics through sysctl
* Introduce spreading IRQs to CPUs capability using sysctl
* Upgrade ena-com to v2.7.0

Bug Fixes:
* Remove outdated APIs

Minor Changes:
* Introduce a shared stats sample interval for all stats

Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 246aa273 23-Oct-2023 Osama Abboud <osamaabb@amazon.com>

ena: Update the license dating to 2023

Some of the files are using outdated linceses.
Update the license to be 2023.

Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 36d42c86 12-Sep-2023 Osama Abboud <osamaabb@amazon.com>

ena: Support srd metrics with sysctl

This commit introduces SRD metrics through sysctl.
The metrics can be queried using the following sysctl node:
sysctl dev.ena.<device index>.ena_srd_info

Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# f97993ad 12-Sep-2023 Osama Abboud <osamaabb@amazon.com>

ena: Support customer metric with sysctl

This commit adds sysctl support for customer metrics.
Different customer metrics can be found in the following sysctl node:
sysctl dev.ena.<device index>.customer_metrics

Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 5b925280 12-Sep-2023 Osama Abboud <osamaabb@amazon.com>

ena: Introduce shared sample interval for all stats

Rename sample_interval node to stats_sample_interval and move
it up in the sysctl tree to make it clear that it's relevant for
all the stats and not only ENI metrics (Currently, sample interval node
is found under eni_metrics node).

Path to node:
dev.ena.<device_index>.stats_sample_interval

Once this parameter is set it will set the sample interval for all the
stats node including SRD/customer metrics.

Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# f9e1d947 30-Oct-2023 Osama Abboud <osamaabb@amazon.com>

ena: Add sysctl support for spreading IRQs

This commit allows spreading IO IRQs over different CPUs through sysctl.
Two sysctl nodes are introduced:
1- base_cpu: servers as the first CPU to which the first IO IRQ
will be bound.
2- cpu_stride: sets the distance between every two CPUs to which every
two consecutive IO IRQs are bound.

For example for doing the following IO IRQs / CPU binding:

IRQ idx | CPU
----------------
1 | 0
2 | 2
3 | 4
4 | 6

Run the following commands:
sysctl dev.ena.<device index>.irq_affinity.base_cpu=0
sysctl dev.ena.<device_index>.irq_affinity.cpu_stride=2

Also introduced rss_enabled field, which is intended to replace
'#ifdef RSS' in multiple places, in order to prevent code duplication.

We want to bind interrupts to CPUs in case of rss set OR in case
the newly defined sysctl paremeter is set. This requires to remove a
couple of '#ifdef RSS' as well in the structs, since we'll be using the
relevant parameters in the CPU binding code.

Approved by: cperciva (mentor)
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# ac40021c 28-May-2023 Arthur Kiyanovski <akiyano@amazon.com>

ena: Update driver version to v2.6.3

Bug Fixes:
* Initialize statistics before the interface is available
* Fix driver unload crash

Minor Changes:
* Mechanically convert ena(4) to DrvAPI
* Remove usage of IFF_KNOWSEPOCH

MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# e5de1d8d 13-Dec-2022 Arthur Kiyanovski <akiyano@amazon.com>

ena: Update driver version to v2.6.2

Bug Fixes:
* Remove timer service re-arm on ena_restore_device failure.
* Re-Enable per-packet missing tx completion print

Minor Changes:
* Switch driver owners from Semihalf to Amazon in man file.

MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Pull Request: https://github.com/freebsd/freebsd-src/pull/637


# 25b64933 04-Jul-2022 Michal Krawczyk <mk@semihalf.com>

ena: Update driver version to v2.6.1

Minor version update which improves styling of a printouts, fixes
the KASAN and KMSAN kernel builds and LLQ reconfiguration after the
device reset.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# b72f1f45 30-Jun-2022 Mark Johnston <markj@FreeBSD.org>

ena: Make first_interrupt a uint8_t

We do not have atomic(9) routines for bools, and it is not guaranteed
that sizeof(bool) is 1.

This fixes the KASAN and KMSAN kernel builds, which fail because the
compiler refuses to silently cast a _Bool * to a uint8_t * when calling
the atomic(9) sanitizer interceptors.

Reviewed by: Dawid Górecki <dgr@semihalf.com>
MFC after: 2 weeks
Fixes: 0ac122c388d9 ("ena: Use atomic_load/store functions for first_interrupt variable")
Differential Revision: https://reviews.freebsd.org/D35683


# 79e15002 10-Jun-2022 Michal Krawczyk <mk@semihalf.com>

ena: Update driver version to v2.6.0

Some of the changes in this release:
* Style fixes
* Fix ENI stats probing
* Add trace for the last Tx cleanup call
* Prevent LLQ initialization if member isn't exposed
* Improve logging

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 8f15f8a7 10-Jun-2022 Dawid Gorecki <dgr@semihalf.com>

ena: Align names of constants

Most of the constants in ena.h file were prefixed with ENA_*, while
others did not have this prefix. Align the constants by prefixing the
remaining constants with ENA.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 82e558ea 10-Jun-2022 Dawid Gorecki <dgr@semihalf.com>

ena: Fix styling issues

Align code style with FreeBSD style(9) guidelines.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# b899a02a 10-Jun-2022 Dawid Gorecki <dgr@semihalf.com>

ena: Move ena_copy_eni_metrics into separate task

Copying ENI metrics was done in callout context, this caused the driver
to panic when sample_interval was set to a value other than 0, as the
admin queue call which was executed could sleep while waiting on
a condition variable. Taskqueue, unlike callout, allows for sleeping, so
moving the function to a separate taskqueue fixes the problem.
ena_timer_service is still responsible for scheduling the taskqueue.

Stop draining the callout during ena_up/ena_down. This was done to
prevent a race between ena_up/down and ena_copy_eni_metrics admin queue
calls. Since ena_metrics_task is protected by ENA_LOCK there is no
possibility of a race between ena_up/down and ena_metrics_task.

Remove a comment about locking in ena_timer_service. With ENI metrics
in a separate task this comment became obsolete.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# d8aba82b 10-Jun-2022 Dawid Gorecki <dgr@semihalf.com>

ena: Store ticks of last Tx cleanup

Store timestamp of last cleanup in Tx ring structure. This does not
change anything during normal operation of the driver but could be
useful when the device fails for some reason.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 3501d4f1 10-Jun-2022 Dawid Gorecki <dgr@semihalf.com>

ena: Add ena_ring_tx_doorbell() function

Add ena_ring_tx_doorbell function to remove code duplication.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 8a5b4859 03-Jan-2022 Michal Krawczyk <mk@semihalf.com>

ena: update ENA version to v2.5.0

Some of the changes in this release:
- IPv6 L4 checksum offload fixes.
- Optimization of the Tx req_id validation.
- Timer service adjustments.
- NUMA awareness for the kernel RSS mode.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 78554d0c 03-Jan-2022 Dawid Gorecki <dgr@semihalf.com>

ena: start timer service on attach

The timer service was started when the interface was brought up and it
was stopped when it was brought down. Since ena_up requires the device
to be responsive, triggering the reset would become impossible if the
device became unresponsive with the interface down.

Since most of the functions in timer service already perform the check
to see if the device is running, this only requires starting the callout
in attach and stopping it when bringing the interface up or down to
avoid race between different admin queue calls.

Since callout functions for timer service are always called with the
same arguments, replace callout_{init,reset,drain} calls with
ENA_TIMER_{INIT,RESET,DRAIN} macros.

Submitted by: Dawid Gorecki <dgr@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 42c7760b 12-Aug-2021 Michal Krawczyk <mk@semihalf.com>

ena: Update driver version to v2.4.1

Some of the changes in this release:
* Hardware RSS hash key reconfiguration and indirection table
reconfiguration support.
* Full kernel RSS support.
* Extra statistic counters.
* Netmap support for ENAv3.
* Locking assertions.
* Extra log messages.
* Reset handling fixes.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 6d1ef2ab 12-Aug-2021 Artur Rojek <ar@semihalf.com>

ena: Implement full RSS reconfiguration

Bind RX/TX queues and MSI-X vectors to matching CPUs based on the RSS
bucket entries.

Introduce sysctls for the following RSS functionality:
- rss.indir_table: indirection table mapping
- rss.indir_table_size: indirection table size
- rss.key: RSS hash key (if Toeplitz used)

Said sysctls are only available when compiled without `option RSS`, as
kernel-side RSS support currently doesn't offer RSS reconfiguration.

Migrate the hash algorithm from CRC32 to Toeplitz and change the initial
hash value to 0x0 in order to match the standard Toeplitz implementation.
Provide helpers for hash key inversion required for HW operations.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 223c8cb1 12-Aug-2021 Artur Rojek <ar@semihalf.com>

ena: Add missing statistics

Provide the following sysctl statistics in order to stay aligned with
the Linux driver:
* rx_ring.csum_good
* tx_ring.unmask_interrupt_num

Also rename the 'bad_csum' statistic name to 'csum_bad' for alignment.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 07aff471 12-Aug-2021 Artur Rojek <ar@semihalf.com>

ena: Share ena_global_lock between driver instances

In order to use `ena_global_lock` in sysctl context, it must be kept
outside the driver instance's software context, as sysctls can be called
before attach and after detach, leading to lock use before sx_init and
after sx_destroy otherwise.
Solve this issue by turning `ena_global_lock` into a file scope
variable, shared between all instances of the driver and associated
sysctl context, and in turn initialized/destroyed in dedicated
SYSINIT/SYSUNINIT functions.
As a side effect, this change also fixes existing race in the reset
routine, when simultaneously accessing sysctl exposed properties.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 986e7b92 12-Aug-2021 Artur Rojek <ar@semihalf.com>

ena: Move RSS logic into its own source files

Delegate RSS related functionality into separate .c/.h files in
preparation for the full RSS support.

While at it, reorder functions and remove prototypes for ones with
internal linkage.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# cb98c439 12-Aug-2021 Artur Rojek <ar@semihalf.com>

ena: Add locking assertions

ENA silently assumed that ena_up, ena_down and ena_start_xmit routines
should be called within locked context. Driver's logic heavily assumes
on concurrent access to those routines, so for safety and better
documentation about this assumption, the locking assertions were added
to the above functions.

The assertion was added only for the main steps (skipping the helper
functions) which can be called from multiple places including the kernel
and the driver itself.

Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 93f0df45 24-Jun-2021 Marcin Wojtas <mw@FreeBSD.org>

Update ENA version to v2.4.0

Some of the changes in this release:
* Large LLQ headers,
* Bug/stability fixes,
* Change of the README/Documentation.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 0e7d31f6 14-Jun-2021 Marcin Wojtas <mw@FreeBSD.org>

ena: hide sysctl nodes for unused ENA queues

IO queue related attributes are registered statically at driver attach
with the rest of the ENA specific sysctl nodes. However, the number of
queues can be changed at runtime via the `ena_sysctl_io_queues_nb`
request, leading to a potential exposure of attributes for non-existing
queues.

Introduce a new `ena_sysctl_update_queue_node_nb` function, which
updates the sysctl nodes after the number of queues is altered.
This happens by either registering or unregistering node specific oids,
based on a delta between the previous and current queue count.

NOTE: All unregistered oids must be registered again before the driver
detach, e.g. by another call to this function.

Submitted by: Artur Rojek <ar@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 1c808fcd 18-Feb-2021 Michal Krawczyk <mk@semihalf.com>

Allocate BAR for ENA MSIx vector table

In the new ENA-based instances like c6gn, the vector table moved to a
new PCIe bar - BAR1. Previously it was always located on the BAR0, so
the resources were already allocated together with the registers.

As the FreeBSD isn't doing any resource allocation behind the scenes,
the driver is responsible to allocate them explicitly, before other
parts of the OS (like the PCI code allocating MSIx) will be able to
access them.

To determine dynamically BAR on which the MSIx vector table is present
the pci_msix_table_bar() is being used and the new BAR is allocated if
needed.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 3 days


# 7dee315e 18-Nov-2020 Marcin Wojtas <mw@FreeBSD.org>

Update ENA driver version to v2.3.0

The v2.3.0 introduces new ena_com layer, ENI metrics updates and SPDX
license tags.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27120


# 7d2e6f20 18-Nov-2020 Marcin Wojtas <mw@FreeBSD.org>

Rename descriptions of the supported ENA devices

Some of the PCI ID were described as ENA with LLQ support - it's not
fully accurate and because of that, their names were changed.

Instead of LLQ, use RSERV0 for the description of those devices.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27119


# f180142c 18-Nov-2020 Marcin Wojtas <mw@FreeBSD.org>

Add ENI metrics for the ENA driver

The new HAL allows the driver to read extra ENI stats. Exact meaning of
each of them can be found in base/ena_defs/ena_admin_defs.h file and
structure ena_admin_eni_stats.

Those stats are being updated inside of the timer service, which is
executed every second.
ENI metrics are turned off by default. They can be enabled, using the
sysctl node: dev.ena.X.eni_metrics.update_delay
0 value in this node means that the update is turned off. Other values
determine how many seconds must pass, before ENI metrics will be
updated.

They can be acquired, using sysctl:

sysctl dev.ena.X.eni_metrics

Where X stands for the interface number.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27118


# 0835cc78 18-Nov-2020 Marcin Wojtas <mw@FreeBSD.org>

Add SPDX license tag to the ENA driver files

Refering to guide: https://wiki.freebsd.org/SPDX the SPDX tag should not
replace the standard license text, however it should be added over the
standard license text to make the automation easier.

Because of that, the old license was kept, but the SPDX tag was added
on top of every ENA driver file.

Submited by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27117


# 2287afd8 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Update ENA driver version to v2.2.0

Driver version upgrade is connected with support for the new device
fetures, like Tx drops reporting or disabling meta caching.

Moreover, the driver configuration from the sysctl was reworked to
provide safer and better flow for configuring:
* number of IO queues (new feature),
* drbr size on Tx,
* Rx queue size.

Moreover, a lot of minor bug fixes and improvements were added.

Copyright date in the license of the modified files in this release was
updated to 2020.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 0b432b70 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Allow disabling meta caching for ENA Tx path

Determined by a flag passed from the device. No metadata is set within
ena_tx_csum when caching is disabled.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 9762a033 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Create ENA IO queues with optional backoff

If requested size of IO queues is not supported try to decrease it until
finding the highest value that can be satisfied.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 56d41ad5 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Add sysctl node for ENA IO queues number adjustment

By default, in ena_attach() the driver attempts to acquire
ena_adapter::max_num_io_queues MSI-X vectors for the purpose of IO
queues, however this is not guaranteed. The number of vectors acquired
depends also on system resources availability.

Regardless of that, enable the number of effectively used IO queues to
be further limited through the sysctl node.

Example: Assumming that there are 8 IO queues configured by default, the
command

$ sysctl dev.ena.0.io_queues_nb=4

will reduce the number of available IO queues to 4. Similarly, the value
can be also increased up to maximum supported value. A value higher than
maximum supported number of IO queues is ignored. Zero is ignored too.

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 21823546 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Rework ENA Tx buffer ring size reconfiguration

This method has been aligned with the way how the Rx queue size is being
updated - so it's now done synchronously instead of resetting the
device.

Moreover, the input parameter is now being validated if it's a power of
2. Without this, it can cause kernel panic.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 7d8c4fee 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Rework ENA Rx queue size configuration

This patch reworks how the Rx queue size is being reconfigured and how
the information from the device is being processed.

Reconfiguration of the queues and reset of the device in order to make
the changes alive isn't the best approach. It can be done synchronously
and it will let to pass information if the reconfiguration was
successful to the user. It now is done in the ena_update_queue_size()
function.

To avoid reallocation of the ring buffer, statistic counters and the
reinitialization of the mutexes when only new size has to be assigned,
the io queues initialization function has been split into 2 stages:
basic, which is just copying appropriate fields and the advanced, which
allocates and inits more advanced structures for the IO rings.

Moreover, now the max allowed Rx and Tx ring size is being kept
statically in the adapter and the size of the variables holding those
values has been changed to uint32_t everywhere.

Information about IO queues size is now being logged in the up routine
instead of the attach.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 02a2a7ce 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Expose argument names for non static ENA driver functions

As functions which are declared in the header files are intended to be
the interface and are going to be used by other files, it's better to
include argument names in the definition, so the caller won't have to
check the .c file in order to check their meaning and order.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 6959869e 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Use single global lock in the ENA driver

Currently, the driver had 2 global locks - one was sx lock used for
up/down synchronization and the second one was mutex, which was used
for link configuration and timer service callout.

It is better to have single lock for that. We cannot use mutex, as it
can sleep and cause witness errors in up/down configuration, so sx lock
seems to be the only choice.

Callout cannot use sx lock, but the timer service is MP safe, so we just
need to avoid race between ena_down() and ena_detach(). It can be
avoided by acquiring sx lock.

Simple macros were added that are encapsulating implementation of the
lock and makes the code cleaner.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 7926bc44 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Add trigger reset function in the ENA driver

As the reset triggering is no longer a simple macro that was just
setting appropriate flag, the new function for triggering reset was
added. It improves code readability a lot, as we are avoiding additional
indentation.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 6c84cec3 26-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Enable Tx drops reporting in the ENA driver

Tx drops statistics are fetched from HW every ena_keepalive_wd() call
and are observable using one of the commands:
* sysctl dev.ena.0.hw_stats.tx_drops
* netstat -I ena0 -d

Submitted by: Maciej Bielski <mba@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 04cf2b88 07-May-2020 Marcin Wojtas <mw@FreeBSD.org>

Optimize ENA Rx refill for low memory conditions

Sometimes, especially when there is not much memory in the system left,
allocating mbuf jumbo clusters (like 9KB or 16KB) can take a lot of time
and it is not guaranteed that it'll succeed. In that situation, the
fallback will work, but if the refill needs to take a place for a lot of
descriptors at once, the time spent in m_getjcl looking for memory can
cause system unresponsiveness due to high priority of the Rx task. This
can also lead to driver reset, because Tx cleanup routine is being
blocked and timer service could detect that Tx packets aren't cleaned
up. The reset routine can further create another unresponsiveness - Rx
rings are being refilled there, so m_getjcl will again burn the CPU.
This was causing NVMe driver timeouts and resets, because network driver
is having higher priority.

Instead of 16KB jumbo clusters for the Rx buffers, 9KB clusters are
enough - ENA MTU is being set to 9K anyway, so it's very unlikely that
more space than 9KB will be needed.

However, 9KB jumbo clusters can still cause issues, so by default the
page size mbuf cluster will be used for the Rx descriptors. This can have a
small (~2%) impact on the throughput of the device, so to restore
original behavior, one must change sysctl "hw.ena.enable_9k_mbufs" to
"1" in "/boot/loader.conf" file.

As a part of this patch (important fix), the version of the driver
was updated to v2.1.2.

Submitted by: cperciva
Reviewed by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: Ido Segev <idose@amazon.com>
Reviewed by: Guy Tzalik <gtzalik@amazon.com>
MFC after: 3 days
PR: 225791, 234838, 235856, 236989, 243531
Differential Revision: https://reviews.freebsd.org/D24546


# 888810f0 24-Feb-2020 Marcin Wojtas <mw@FreeBSD.org>

Rework and simplify Tx DMA mapping in ENA

Driver working in LLQ mode in some cases can send only few last segments
of the mbuf using DMA engine, and the rest of them are sent to the
device using direct PCI transaction. To map the only necessary data, two DMA
maps were used. That solution was very rough and was causing a bug - if
both maps were used (head_map and seg_map), there was a race in between
two flows on two queues and the device was receiving corrupted
data which could be further received on the other host if the Tx cksum
offload was enabled.

As it's ok to map whole mbuf and then send to the device only needed
segments, the design was simplified to use only single DMA map.

The driver version was updated to v2.1.1 as it's important bug fix.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.


# 1c028d72 01-Nov-2019 Warner Losh <imp@FreeBSD.org>

Make valdiate_rx_req_id static inline because it uses other static
inline functions. gcc complains about this, most likely due to
the subtle differences between inline and static inline functions
defined in headers.


# 2731abe8 31-Oct-2019 Marcin Wojtas <mw@FreeBSD.org>

Update ENA version to v2.1.0

In this release the netmap support was introduced.

Moreover, it is also now possible to use the LLQ mode of the driver on
the arm64 AWS instances (A1 type).

Differential Revision: https://reviews.freebsd.org/D21938
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 6f2128c7 31-Oct-2019 Marcin Wojtas <mw@FreeBSD.org>

Add support for ENA NETMAP Tx

Two new tables are added to ena_tx_buffer structure:
* netmap_map_seg stores DMA mapping structures,
* netmap_buf_idx stores buff indexes taken from the slots.

When Tx resources are being set, the new mapping structures are created
and netmap Tx rings are being reset.

When Tx resources are being released, used netmap bufs are unmapped from
DMA and then mapping structures are destroyed.

When Tx interrupt occurrs, ena_netmap_tx_irq is called.

ena_netmap_txsync callback signalizes that there are new packets which
should be transmitted.
First, it fills ena_netmap_ctx. Then it performs two actions:
* ena_netmap_tx_frames moves packets from netmap ring to NIC,
* ena_netmap_tx_cleanup restores buffers from NIC and gives them back
to the userspace app.
0 is returned in case of Tx error that could be handled by the driver.

ena_netmap_tx_frames checks if there are packets ready for transmission.
Then, for each of them, ena_netmap_tx_frame is called. If error occurs,
transmitting is stopped, but if the error was cause due to HW ring being
full, information about that is not propagated to the userspace app.
When all packets are ready, doorbell is written to NIC and netmap ring
state is updated.

Parsing of one packet is done by the ena_netmap_tx_frame function.
First, it checks if number of slots does not exceed NIC limit. Invalid
packets are being dropped and the error is propagated to the upper
layer. As each netmap buffer has equal size, which is typically greater
then 2KiB, there shouldn't be any packets which contain too many slots.
Then, the ena_com_tx_ctx structure is being filled. As netmap does not
support any hardware offloads, ena_com_tx_meta structure is set to zero.
After that, ena_netmap_map_slots maps all memory slots for DMA.
If the device works in the LLQ mode, the push header is being determined
by checking if the header fits within the first socket.
If so, the portion of data is being copied directly from the slot.
In other case, the data is copied to the intermediate buffer.
First slots are treated the same as as the others, because DMA mapping
has no impact on LLQ mode. Index of each netmap buffer is taken from
slot and stored in netmap_buf_idx array. In case of mapping error,
memory is unmapped and packets are put back to the netmap ring.

ena_netmap_tx_cleanup performs out of order cleanup of sent buffers.
First, req_id is taken and is validated. As validate_tx_req_id from
ena.c is specific to kernels mbuf, another implementation is provided.
Each req_id is cleaned up by ena_netmap_tx_clean_one function. Buffers
are being unmaped from DMA and put back to netmap ring. In the end,
state of netmap and NIC rings are being updated.

Differential Revision: https://reviews.freebsd.org/D21936
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 9a0f2079 31-Oct-2019 Marcin Wojtas <mw@FreeBSD.org>

Add support for ENA NETMAP Rx

Most of code used for Rx ring initialization could be reused in NETMAP.
Reset of NETMAP ring and new alloc method was added. Driver decides if
use kernels mbufs or NETMAPs slots based on IFCAP_NETMAP flag. It
allows to reuse ena_refill_rx_bufs, which provides proper handling of
Rx out of order completion.

ena_netmap_alloc_rx_slot takes exactly the same arguments as
ena_alloc_rx_mbuf, but instead of allocating one mbuf it takes one slot
from NETMAP ring. Based on queue id proper netmap_ring is found. As
NETMAP provides the "partial opening" feature not all of the rings are
avaiable. Not used points to invalid ring. If there is available slot,
it is taken from the ring. Its buffer is mapped to DMA and its index is
stored in ena_rx_buffer field in ena_rx_buffer structure. Then ena_buf
is filled with addresses and ring state is updated.

Cleanup is handled by ena_netmap_free_rx_slot. It unmaps DMA and returns
buffer to ring. As we could not return more bufs than we have taken and
we should not override occupied slots, buf_index should be 0. It is
being checked by assertion.

ena_netmap_rxsync callback puts received packets back to NETMAP ring and
passes them to user space by updating ring pointers. First it fills
ena_netmap_ctx.
Then it performs two actions:
* ena_netmap_rx_frames moves received frames from NIC to NETMAP ring,
* ena_netmap_rx_cleanup fills NIC ring with slots released by userspace
app.

In case of Rx error that could be handled by NIC driver (for example by
performing reset) rx sync should return 0.

ena_netmap_rx_frames first checks if NETMAP ring is in consistent
state and then in the loop receives new frames. When all available
frames are taken nr_hwtail is updated.

Receiving one frame is handled by ena_netmap_rx_frame. If no error
occurrs, each Descriptor is loaded by ena_netmap_rx_load_desc function.
If packets take more than one segments NS_MOREFRAG flag must be set in
all, but not last slot. In case of wrong req_id packet is removed from
NETMAP ring. If packet is successful received counters are updated.

Refiling of NIC ring is performed by ena_netmap_rx_cleanup function.
It calculates number of available slots and call ena_refill_rx_bufs with
proper number.

Differential Revision: https://reviews.freebsd.org/D21935
Submitted by: Rafal Kozik <rk@semihalf.com>
Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 38c7b965 31-Oct-2019 Marcin Wojtas <mw@FreeBSD.org>

Split Rx/Tx from initialization code in ENA driver

Move Rx/Tx routines to separate file.
Some functions:
* ena_restore_device,
* ena_destroy_device,
* ena_up,
* ena_down,
* ena_refill_rx_bufs
could be reused in upcoming netmap code in the driver. To make it
possible, they were moved to ena.h header.

Differential Revision: https://reviews.freebsd.org/D21933
Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 9d0073e4 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Update ENA version to v2.0.0

ENAv2 introduces many new features, bug fixes and improvements.

Main new features are LLQ (Low Latency Queues) and independent queues
reconfiguration using sysctl commands.

The year in copyright notice was updated to 2019.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 32f63fa7 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Split ENA reset routine into restore and destroy stages

For alignment with Linux driver and better handling ena_detach(), the
reset is now calling ena_device_restore() and ena_device_destroy().

The ena_device_destroy() is also being called on ena_detach(), so the
code will be more readable.

The watchdog is now being activated after reset only, if it was active
before.

There were added additional checks to ensure, that there is no race with
the link state change AENQ handler.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# fd43fd2a 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Use bitfield for storing global ENA device states

As the ENA can have multiple states turned on/off, it is more convenient
to store them in single bitfield instead of multiple boolean variables.

The bitset FreeBSD API was used for the bitfield implementation, as it
provides flexible structure together with API which also supports atomic
bitfield operations.

For better readability basic macros from API were wrapped into custom
ENA_FLAG_* macros, which are filling up common parameters for all calls.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# af66d7d0 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Add additional doorbells on ENA Tx path

The new ENA HAL is introducing API, which can determine on Tx path if
the doorbell is needed.

That way, it can tell the driver, that it should call an doorbell.
The old threshold value wasn't removed, as not all HW is supporting this
feature - so it was reworked to also work with the new API.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 82f5a792 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Limit maximum size of Rx refill threshold in ENA

The Rx ring size can be as high as 8k. Because of that we want to limit
the cleanup threshold by maximum value of 256.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 4fa9e02d 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Add support for the LLQv2 and WC in ENA

LLQ (Low Latency Queue) is the feature, that allows pushing header
directly to the device through PCI before even DMA is triggered.

It reduces latency, because device can start preparing packet before
payload is sent through DMA.

To speed up sending data through PCI, the Write Combining is enabled,
which allows hardware to buffer data before sending them on the PCI - it
allows to reduce number of PCI IO operations.

ENAv2 is using special descriptor for the negotiation of the LLQ.
Currently, only the default configuration is supported.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 5cb9db07 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Lock optimization in ENA

Handle IO interrupts using filter routine. That way, the main cleanup
task could be moved to the separate thread using taskqueue.

The deferred Rx cleanup task was removed, and now the cleanup task is
begin called instead. That way, the Rx lock could be removed.

In addition, Queue management (wake up and stop TX ring) was added, so
the TX cleanup task can be performed mostly lockless.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 6064f289 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Add tuneable drbr ring size and hw queues depth for ENA

The driver now supports per adapter tuning of buffer ring size and HW Rx
ring size.

It can be achieved using sysctl node dev.ena.X.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# d12f7bfc 30-May-2019 Marcin Wojtas <mw@FreeBSD.org>

Check for missing MSI-x and Tx completions in ENA

If the first MSI-x won't be executed, then the timer service will detect
that and trigger device reset.

The checking for missing Tx completion was reworked, so it will also
check for missing interrupts. Checking number of missing Tx completions
can be performed after loop, instead of checking it every iteration.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# c2e7e247 21-Mar-2019 Marcin Wojtas <mw@FreeBSD.org>

Prevent double activation of admin interrupt in ENA

The resource is already being activated in the bus_alloc_resource(),
because the flag RF_ACTIVE is being passed.

Double activation on arm64 is causing kernel panic.

Version of the driver was upgraded to 0.8.4.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported-by: Greg V <greg@unrelenting.technology>
Tested-by: cperciva, Greg V <greg@unrelenting.technology>
Obtained from: Semihalf
MFC after: 2 weeks
Sponsored by: Amazon, Inc.
Differential revision: https://reviews.freebsd.org/D19655


# 1d65b4c0 15-Feb-2019 Marcin Wojtas <mw@FreeBSD.org>

Do not use ntc for obtaining buffer on Rx in the ENA

In out of order mode Rx buffer are accesses by req_id.
Accessing and validating mbuf using ntc is causing false error.

Increase driver revision after latest RX OOO completion fixes.

Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week


# 40abe76b 08-Jul-2018 Warner Losh <imp@FreeBSD.org>

Add PNP info to PCI attachment of ena driver

Make unsigned values uint16_t for pnp table. They are properly
uint16_t befause they are 16-bit PCI IDs. The PNP_INFO language has no
type for bare unsigned.

Reviewed by: imp, chuck
Submitted by: Lakhan Shiva Kamireddy <lakhanshiva@gmail.com>
Sponsored by: Google, Inc. (GSoC 2018)
Pull Request: https://github.com/bsdimp/freebsd/pull/5


# fbb0ed71 10-May-2018 Marcin Wojtas <mw@FreeBSD.org>

Upgrade ENA version to v0.8.1

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.


# 4727bda6 09-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Allow usage of more RX descriptors than 1 in ENA driver

Using only 1 descriptor on RX could be an issue, if system would be low
on resources and could not provide driver with large chunks of
contiguous memory.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: byenduri_gmail.com
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12871


# 3cfadb28 09-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Read max MTU from the ENA device

The device now provides driver with max available MTU value it
can handle.

The function setting MTU for the interface was simplified and reworked
to follow up this changes.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: byenduri_gmail.com
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12870


# 5a990212 08-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Cleanup of the ENA driver header file

Remove unused macros and fields - some of them were only initialized,
without further usage.

Implement minor style fixes and add required comments.

On the occasion add missing TX completion counter, which was existing,
but mistakenly remained unused.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12864


# 8805021a 08-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Allow partial MSI-x allocation in ENA driver

The situation, where part of the MSI-x was not configured properly, was
not properly handled. Now, the driver reduces number of queues to
reflect number of existing and properly configured MSI-x vectors.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: byenduri_gmail.com
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12863


# 0052f3b5 08-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Remove deprecated and unused counters in ENA driver

Few counters were imported from the Linux driver and never used,
because of differences between the Linux and FreeBSD APIs.

Queue stops and resumes are no longer supported by the driver and
counters were incremented indicating false events.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: rlibby
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12862


# 0bdffe59 09-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Refactor style of the ENA driver

* Change all conditional checks in "if" statement to boolean expressions
* Initialize variables with too complex values outside the declaration
* Fix indentations
* Move code associated with sysctls to ena_sysctl.c file
* For consistency, remove unnecesary "return" from void functions
* Use if_getdrvflags() function instead of accesing variable directly

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12860


# efe6ab18 09-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Check for Rx ring state to prevent from stall in the ENA driver

In case when Rx ring is full and driver will fail to allocate Rx mbufs,
the ring could be stalled.

Keep alive is checking every second for Rx ring state, and if it is full
for two cycles, then trigger rx_cleanup routine in another thread.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: byenduri_gmail.com
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12856


# 43fefd16 09-Nov-2017 Marcin Wojtas <mw@FreeBSD.org>

Add RX OOO completion feature

The RX out of order completion feature, allows to complete RX
descriptors out of order, by keeping trace of all free descriptors in
the separate array.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: byenduri_gmail.com
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
Differential Revision: https://reviews.freebsd.org/D12855


# 30217e2d 31-Oct-2017 Marcin Wojtas <mw@FreeBSD.org>

Rework counting of hardware statistics in ENA driver

Do not read all statistics from the device, instead count them in the
driver except from RX drops - they are received directly from the NIC
in the AENQ descriptor.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Reviewed by: imp
Obtained from: Semihalf
Sponsored by: Amazon.com, Inc.
Differential Revision: https://reviews.freebsd.org/D12852


# 1b069f1c 03-Jul-2017 Zbigniew Bodek <zbb@FreeBSD.org>

Replace mbuf defragmentation with collapse

Collapse should be more effective than defragmentation.
Added missing declaration of ena_check_and_collapse_mbuf().

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.


# 8a573700 03-Jul-2017 Zbigniew Bodek <zbb@FreeBSD.org>

Fix creation of dma tags and TSO settings

TSO settings were not reflecting real HW capabilities.

DMA tags were created with wrong window - high address was the same as
low, so excluding window was not working.

Capabilities of TX dma transaction were not set properly - TSO max size
had been increased and size of one segment had been adjusted.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.


# b9252a88 30-May-2017 Zbigniew Bodek <zbb@FreeBSD.org>

Move ENA's hw stats updating routine to separate task

Initially, stats were being updated each time OS was requesting for
the first statistic.
To read statistics from hw, condvar was used. cv_timedwait cannot be
called when unsleepable lock is held, and this happens when FreeBSD
is requesting statistic.
Seperate task is reading statistics from NIC each 1 second.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10926


# 1e9fb899 30-May-2017 Zbigniew Bodek <zbb@FreeBSD.org>

Add mbuf defragmentation to the ENA driver

When mbuf chain is too long and device cannot handle that number
of segments in DMA transaction, mbuf chain will be defragmented.
Initially, driver was dropping all mbuf chains that were exceeding
supported number of segments.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10923


# 9b8d05b8 22-May-2017 Zbigniew Bodek <zbb@FreeBSD.org>

Add support for Amazon Elastic Network Adapter (ENA) NIC

ENA is a networking interface designed to make good use of modern CPU
features and system architectures.

The ENA device exposes a lightweight management interface with a
minimal set of memory mapped registers and extendable command set
through an Admin Queue.

The driver supports a range of ENA devices, is link-speed independent
(i.e., the same driver is used for 10GbE, 25GbE, 40GbE, etc.), and has
a negotiated and extendable feature set.

Some ENA devices support SR-IOV. This driver is used for both the
SR-IOV Physical Function (PF) and Virtual Function (VF) devices.

ENA devices enable high speed and low overhead network traffic
processing by providing multiple Tx/Rx queue pairs (the maximum number
is advertised by the device via the Admin Queue), a dedicated MSI-X
interrupt vector per Tx/Rx queue pair, and CPU cacheline optimized
data placement.

The ENA driver supports industry standard TCP/IP offload features such
as checksum offload and TCP transmit segmentation offload (TSO).
Receive-side scaling (RSS) is supported for multi-core scaling.

The ENA driver and its corresponding devices implement health
monitoring mechanisms such as watchdog, enabling the device and driver
to recover in a manner transparent to the application, as well as
debug logs.

Some of the ENA devices support a working mode called Low-latency
Queue (LLQ), which saves several more microseconds. This feature will
be implemented for driver in future releases.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Jakub Palider <jpa@semihalf.com>
Jan Medala <jan@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon.com Inc.
Differential revision: https://reviews.freebsd.org/D10427