History log of /freebsd-current/sys/dev/nvme/nvme.h
Revision Date Author Comments
# e84a75f9 07-Apr-2024 Warner Losh <imp@FreeBSD.org>

nvme: Add telemetry page definitions

Add definition for page types 7 and 8 for host initiated telemetry and
controller initiated telemetry (they differ by one byte, but that byte
that's defined in the host version is reserved in the controller
version).

Sponsored by: Netflix


# ebcfab99 08-May-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Explicitly align struct nvme_command on an 8 byte boundary

This was already true for most architectures due to uint64_t structure
members. However, i386 is special in that it only requires 4 byte
alignment for uint64_t. As a result, casts from struct nvme_command
to struct nvmf_fabric_cmd were raising a "cast increases alignment"
warning on i386. Explicitly aligning struct nvme_command pacifies
this warning on i386.

Reported by: rscheff
Sponsored by: Chelsio Communications


# 29d7e39f 07-May-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Bump the alignment of struct nvme_health_information_page to 8

This ensures that embedded uint64_t values used for statistics
counters are aligned when allocating a structure on the stack or as
part of a containing structure. In particular this quiets
-Waddress-of-packed-member warnings from GCC when compiling the code
in nvmfd to update the stats.

Reported by: GCC


# 5e3e4442 02-May-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add constants for the Fused Operation (FUSE) field in commands

Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44845


# d86edc18 02-May-2024 John Baldwin <jhb@FreeBSD.org>

nvmf.h: New header defining ioctls for NVMe over Fabrics

This defines structures, ioctl commands, and related constants used
for both the Fabrics host and controller.

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44706


# 97b77de2 16-Apr-2024 Warner Losh <imp@FreeBSD.org>

nvme: Eliminate intel_log_temp_stats_swapbytes

We can't post a AER for this page, so there's no need to be able to swap
it to host byte order. It's not one of the standard defined pages that
can post via AER, and the vendor's public docs for this temperature page
don't suggest it's possible to get over or under event changes. Since
nvmecontrol no longer needsd the swap routine, remove it since it's
now unused.

Sponsored by: Netflix
Reviewed by: chuck
Differential Revision: https://reviews.freebsd.org/D44659


# 0b8f21e8 03-Apr-2024 Warner Losh <imp@FreeBSD.org>

nvme: Add LPA bits

Add all the bits from the NVMe 2.0 base specification: CMD_EFFECTS to
indicate the commands and effects log page is supported, TELEMETRY to
indicate that the telemetry log pages and protocols are supported,
PERSISTENT_EVENTS to indicate the persistent event log is supported,
LOG_PAGES_PAGE to indicate that various log pages related to log page
and command support are supported: L0, L5, L12, and L13. and
DA4_TELEMETRY to indicate that the DA4 area is supported for telemetry
data.

Sponsored by: Netflix


# 21d3a84d 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add NVMe over Fabrics fields to nvme_controller_data

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44448


# 7fa8adb8 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add constants for the Controller Attributes field in cdata

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44447


# 88ecf154 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add constants and types for the discovery log page

This is used in NVMe over Fabrics to enumerate a list of available
controllers.

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44446


# b354bb04 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add constants for fields in AER completion dword 0

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44445


# cbda1886 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add constants for the extended data for Get Log Page command flag

nvme(4) doesn't check this flag, but Fabrics implementations may need
to set this flag in the log page attributes cdata field.

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44444


# b8cb8dd3 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add constants for the PSDT field in cdw0

This is not used in nvme(4) but is used in NVMe over Fabrics
transports which use SGLs to describe buffers instead of PRPs.

While here, adjust the shift value for the FUSE field to be relative
to the 'fuse' member of 'struct nvme_command'.

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44443


# f21a54d1 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add SGL structure and constants for use in NVMe commands

Fabrics capsules use an SGL structure instead of prp1/2 addresses to
describe the data buffer used for a command. The SGL structure is
added to a union with the existing prp1/2 fields.

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44442


# 1931b75e 22-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Export constants for min and max queue sizes

These are useful for NVMe over Fabrics.

Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44441


# 2a2682ee 06-Mar-2024 Warner Losh <imp@FreeBSD.org>

nvme: Add SMART WARNING for persistent memory region

NVME 2.0 added persistent memory regions, and this bit reports critical
warnings / errors with those regions.

Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D44213


# 7485926e 01-Mar-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Firmware revisions in the firmware slot info logpage are ASCII strings

In particular, don't try to byteswap the values as 64-bit integers and
always print a non-empty version as a string.

Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44121


# 3a477a9b 29-Jan-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Add NVMEF helper macro as the inverse of NVMEV

This macro accepts a field name and a value for the field and
constructs the shifted field value.

Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43604


# 1dade1f2 29-Jan-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Rename NVMEB helper macro to NVMEM

The current macro always builds a full mask for a named field, so use
the M suffix for mask.

Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43601


# 479680f2 29-Jan-2024 John Baldwin <jhb@FreeBSD.org>

nvme: Use the NVMEV macro instead of expanded versions

Reviewed by: chuck
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D43595


# b46c7b1e 27-Dec-2023 Alexander Motin <mav@FreeBSD.org>

nvme: Add some bits from NVMe 2.0c spec.

MFC after: 1 week


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 081c22db 15-Aug-2023 John Baldwin <jhb@FreeBSD.org>

nvme.h: Fix a comment typo in admin opcode enum

Sponsored by: Chelsio Communications


# ac8c866f 07-Aug-2023 Warner Losh <imp@FreeBSD.org>

nvme: Add more NVME Base Spec 2.0 and NVME Command Set Spec 1.0a

Add admin commands capacity management, lockdown and fabrics commands.
Add I/O copy command.

Sponsored by: Netflix
Reviewed by: chuck, mav, jhb
Differential Revision: https://reviews.freebsd.org/D41311


# 5ae44634 27-Jun-2023 John Baldwin <jhb@FreeBSD.org>

nvme: Fix typo in "Command Aborted by Host" constant name.

Reviewed by: chuck, imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D40763


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 9a5acf36 19-Dec-2022 Dag-Erling Smørgrav <des@FreeBSD.org>

nvme: Clear the notify flag if the consumer rejects the controller.

While here, fix some type mismatch warnings.

Reviewed by: imp
Sponsored by: Netapp, Inc.
Sponsored by: Klara, Inc.
MFC after: 1 week


# 8ab99dbe 14-Nov-2022 Wanpeng Qian <wanpengqian@gmail.com>

bhyve: abort and return FEATURE_NOT_SAVEABLE while set feature with a save flag for NVMe controller.

Currently bhyve's NVMe controller cannot save feature values cross
reboot. It should return a FEATURE_NOT_SAVEABLE error when the command
specifies a save flag.

Quote from NVMe specification, page 205:

https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf

If the Feature Identifier specified in the Set Features command is not
saveable by the controller and the controller receives a Set Features
command with the Save bit set to one, then the command shall be aborted
with a status of Feature Identifier Not Saveable.

Reviewed by: chuck (older version)
Approved by: manu (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32767


# a69c0964 05-Aug-2022 Alexander Motin <mav@FreeBSD.org>

nvme: Print CRD, M and DNR status bits on errors.

It may help with some issues debugging.

MFC after: 1 week


# 3086efe8 15-Apr-2022 Warner Losh <imp@FreeBSD.org>

nvme: Remove NVME_MAX_XFER_SIZE, replace inline calculation

NVME_MAX_XFER_SIZE used to be a constant (back when MAXPHYS was a
constant) to denote the smaller of MAXPHYS or the largest PRP we could
encode with our prealloation scheme. However, it's no longer constant
since MAXPHYS varies at runtime. In addition, the actual maximum is now
based on the drive's currently in use page_size, which is also a runtime
expression. As such, remove the define and expand it inline in the one
place its used still in the tree.

Sponsored by: Netflix
Reviewed by: chuck
Differential Revision: https://reviews.freebsd.org/D34870


# e66c1b51 15-Apr-2022 Warner Losh <imp@FreeBSD.org>

nvme: Define NVME_MPS_SHIFT

The memory page size (MPS) is expressed in terms of a 2^(number + 12)
and other items in the system inherit this. Create a define rather than
sprinkling 12 everywehere.

Sponsored by: Netflix
Reviewed by: chuck
Differential Revision: https://reviews.freebsd.org/D34865


# 214df80a 08-Apr-2022 Warner Losh <imp@FreeBSD.org>

nvme: new define for size of host memory buffer sizes

The nvme spec defines the various fields that specify sizes for host
memory buffers in terms of 4096 chunks. So, rather than use a bare 4096
here, use NVME_HMB_UNITS. This is explicitly not the host page size of
4096, nor the default memory page size (mps) of the NVMe drive, but its
own thing and needs its own define.

No functional change is intended, only the logical spelling of 4k.

Sponsored by: Netflix


# c2318cf8 21-Feb-2022 Chuck Tuffli <chuck@FreeBSD.org>

nvme: fix spelling of Namespace

Fix spelling of a macro definition.

Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D34330


# e71afa12 21-Feb-2022 Chuck Tuffli <chuck@FreeBSD.org>

nvme: Add OAES bit-field definitions

Create definitions for the Optional Asynchronous Events Supported (OAES)
values. Also adds a helper macro for the common use case of "mask and
shift". E.g.
value = NVME_CTRLR_DATA_OAES_NS_ATTR_MASK << NVME_CTRLR_DATA_OAES_NS_ATTR_SHIFT;
becomes
value = NVMEB(NVME_CTRLR_DATA_OAES_NS_ATTR);

Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D34300


# fea3cf1d 02-Jul-2021 Warner Losh <imp@FreeBSD.org>

nvme: Fix alignment on nvme structures

Remove __packed from nvme_command, nvme_completion and
nvme_dsm_trim. Add super-alignment to nvme_completion since it's always
at least that aligned in hardware (and in our existing uses of it
embedded in structures). It generates better code in
nvme_qpair_process_completions on riscv64 because otherwise the ABI
assumes a 4-byte alignment, and the same on all other platforms.

Reviewed by: jrtc27@, mav@, chuck@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D31001


# 80a75155 02-Jul-2021 Warner Losh <imp@FreeBSD.org>

nvme: style nit

Put the { on the same line as the struct nvme_foo when we define these
structures. It's FreeBSD standard and these were inconsistent.

Sponsored by: Netflix


# e83fdf8b 08-Jan-2021 Chuck Tuffli <chuck@FreeBSD.org>

fix big-endian platforms after 6733401935f8

The NVMe byte-swap routines for big-endian platforms used memcpy() to
move the unaligned 64-bit value into a temp register to byte swap it.
Instead of introducing a dependency, manually byte-swap the values in
place.

Point hat: me


# 67334019 08-Jan-2021 Chuck Tuffli <chuck@FreeBSD.org>

nvmecontrol: add device self-test op and log page

Add decoding of the Device Self-test log page and the ability to start
or abort a test.

Reviewed by: imp, mav
Tested by: Muhammad Ahmad <muhammad.ahmad@seagate.com>
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D27517


# 8d08cdc7 02-Dec-2020 Chuck Tuffli <chuck@FreeBSD.org>

nvme: Fix typo in definition

Change occurrences of "selt test" to "self tests in the NVMe header
file.

Reviewed by: imp, mav
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D27439


# cf7c0629 01-Dec-2020 Michal Meloun <mmel@FreeBSD.org>

Always use the __unused attribute even for potentially unused parameters.

Requested by: ian, imp
MFC with: r368167


# b2e9e573 30-Nov-2020 Michal Meloun <mmel@FreeBSD.org>

Unbreak r368167 in userland. Decorate unused arguments.

Reported by: kp, tuexen, jenkins, and many others
MFC with: r368167


# 52a83207 30-Nov-2020 Michal Meloun <mmel@FreeBSD.org>

NVME: Don't try to swap data on little endian machines.
These swapping functions violate BUSDMA contract - we cannot write
to armed (by bus_dmamap_sync(PRE_..)) buffers. Remove them at least
from little endian machines until a better solution will be developed.

Reviewed by: imp
MFC after: 3 weeks


# ac90f70d 28-Nov-2020 Alexander Motin <mav@FreeBSD.org>

Increase nvme(4) maximum transfer size from 1MB to 2MB.

With 4KB page size the 2MB is the maximum we can address with one page PRP.
Going further would require chaining, that would add some more complexity.

On the other side, to reduce memory consumption, allocate the PRP memory
respecting maximum transfer size reported in the controller identify data.
Many of NVMe devices support much smaller values, starting from 128KB.
To do that we have to change the initialization sequence to pull the data
earlier, before setting up the I/O queue pairs. The admin queue pair is
still allocated for full MIN(maxphys, 2MB) size, but it is not a big deal,
since there is only one such queue with only 16 trackers.

Reviewed by: imp
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# cd853791 27-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225


# 0bed3eab 13-Nov-2020 Alexander Motin <mav@FreeBSD.org>

Add PMRCAP printing and fix earlier CAP_HI.

MFC after: 3 days


# 6dd1985b 28-Oct-2020 Alexander Motin <mav@FreeBSD.org>

Fix unintentional constant rename in r367109.

MFC after: 1 week


# c44441f8 28-Oct-2020 Alexander Motin <mav@FreeBSD.org>

Print NVMe controller capabilities in verbose dmesg.

Those values are not reported in controller identification, while sometimes
interesting for development and debugging.

MFC after: 1 week


# e32d47f3 21-Sep-2020 David Bright <dab@FreeBSD.org>

Add an ioctl to get an NVMe device's maximum transfer size

Reviewed by: imp, chuck
Obtained from: Dell EMC Isilon
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26390


# d87b31e1 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

nvme: clean up empty lines in .c and .h files


# 881534f0 31-Aug-2020 Warner Losh <imp@FreeBSD.org>

Use symbolic names for asych events

Rather than |= 0x300, define and use asyn event names for the name
space changes and the firmware activations that we're asking for.


# 67abaee9 07-Jan-2020 Alexander Motin <mav@FreeBSD.org>

Add Host Memory Buffer support to nvme(4).

This allows cheapest DRAM-less NVMe SSDs to use some of host RAM (about
1MB per 1GB on the devices I have) for its metadata cache, significantly
improving random I/O performance. Device reports minimal and preferable
size of the buffer. The code limits it to 1% of physical RAM by default.
If the buffer can not be allocated or below minimal size, the device will
just have to work without it.

MFC after: 2 weeks
Relnotes: yes
Sponsored by: iXsystems, Inc.


# 70d20ed3 05-Aug-2019 Alexander Motin <mav@FreeBSD.org>

Add `nvmecontrol resv` to handle NVMe reservations.

NVMe reservations are quite alike to SCSI persistent reservations and
can be used in clustered setups with shared multiport storage.

MFC after: 10 days
Relnotes: yes
Sponsored by: iXsystems, Inc.


# a6d222eb 02-Aug-2019 Alexander Motin <mav@FreeBSD.org>

Add more random bits from NVMe 1.4.

MFC after: 2 weeks


# 6c99d132 02-Aug-2019 Alexander Motin <mav@FreeBSD.org>

Decode few more NVMe log pages.

In particular: Changed Namespace List, Commands Supported and Effects,
Reservation Notification, Sanitize Status.

Add few new arguments to `nvmecontrol log` subcommand.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 8dafbebd 01-Aug-2019 Alexander Motin <mav@FreeBSD.org>

Fix typo in r350529.

MFC after: 2 weeks


# 90dfa8f0 01-Aug-2019 Alexander Motin <mav@FreeBSD.org>

Add more new fields and values from NVMe 1.4.

MFC after: 2 weeks


# a7bf63be 01-Aug-2019 Alexander Motin <mav@FreeBSD.org>

Add IOCTL to translate nvdX into nvmeY and NSID.

While very useful by itself, it also makes `nvmecontrol` not depend on
hardcoded device names parsing, that in its turn makes simple to take
nvdX (and potentially any other) device names as arguments.

Also added IOCTL bypass from nvdX to respective nvmeYnsZ makes them
interchangeable for management purposes.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 8de2d8c0 28-Jul-2019 Alexander Motin <mav@FreeBSD.org>

Add some new fields and bits from NVMe 1.4.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 62d2cf18 18-Jul-2019 Warner Losh <imp@FreeBSD.org>

Provide macros to extract the sub-fields of the CAP_LO and CAP_HI registers.

These macros make places where we extract these easier to read. The shift and
mask stuff is also a bit tedious and error prone. Start with the CAP_LO and
CAP_HI registers since their scope is somewhat constrained. This is style
chagne only, no functional changes.

Reviewed by: chuck
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20979


# 1aed4995 05-May-2019 Alexander Motin <mav@FreeBSD.org>

Decode Deallocate Logical Block Features.

MFC after: 1 week


# 87b3975e 13-Dec-2018 Chuck Tuffli <chuck@FreeBSD.org>

nda(4) fix check for Dataset Management support

In the nda(4) driver, only set DISKFLAG_CANDELETE (a.k.a. can support
BIO_DELETE) if the drive supports Dataset Management. There are reports
that without this check, VMWare Workstation does not work reliably.

Fix is to check the ONCS field in the NVMe Controller Data structure for
support. This check previously existed but did not survive the
big-endian changes.

Reported by: yuripv@yuripv.net
Reviewed by: imp, mav, jimharris
Approved by: imp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D18493


# 9544e6dc 21-Aug-2018 Chuck Tuffli <chuck@FreeBSD.org>

Make NVMe compatible with the original API

The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).

This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.

This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.

Bump __FreeBSD_version to 1200081 due to API change

Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404


# f439e3a4 24-May-2018 Alexander Motin <mav@FreeBSD.org>

Refactor NVMe CAM integration.

- Remove layering violation, when NVMe SIM code accessed CAM internal
device structures to set pointers on controller and namespace data.
Instead make NVMe XPT probe fetch the data directly from hardware.
- Cleanup NVMe SIM code, fixing support for multiple namespaces per
controller (reporting them as LUNs) and adding controller detach support
and run-time namespace change notifications.
- Add initial support for namespace change async events. So far only
in CAM mode, but it allows run-time namespace arrival and departure.
- Add missing nvme_notify_fail_consumers() call on controller detach.
Together with previous changes this allows NVMe device detach/unplug.

Non-CAM mode still requires a lot of love to stay on par, but at least
CAM mode code should not stay in the way so much, becoming much more
self-sufficient.

Reviewed by: imp
MFC after: 1 month
Sponsored by: iXsystems, Inc.


# afdbfe1e 19-Mar-2018 Warner Losh <imp@FreeBSD.org>

Starting LBA is a 64bit number, so use htole64 instead of htole32. The
latter casts the LBA to a 32-bit number before assigning it to the 64
bit structure entity. This works fine on the first 2TB of TRIMs, but
terrible beyond that due to trucation.

Also, add an assert to make sure we don't end too many DSM TRIM
entries in one request.

Sponsored by: Netflix


# 807e94b2 14-Mar-2018 Warner Losh <imp@FreeBSD.org>

Implement trim collapsing in nda

When multiple trims are in the queue, collapse them as much as
possible. At present, this usually results in only a few trims being
collapsed together, but more work on that will make it possible to do
hundreds (up to some configurable max).

Sponsored by: Netflix


# 01c1be35 12-Mar-2018 Alexander Motin <mav@FreeBSD.org>

Print fuses and fna fields in identify data.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 6b1a96b1 10-Mar-2018 Alexander Motin <mav@FreeBSD.org>

Add new opcodes and statuses from NVMe 1.3a.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# 3fa5467a 10-Mar-2018 Alexander Motin <mav@FreeBSD.org>

Add new identify data structures fields from NVMe 1.3a.

Some of them are already supported by existing hardware, so reporting
them `nvmecontrol identify` can be useful.


# afdc2600 22-Feb-2018 Kyle Evans <kevans@FreeBSD.org>

nvme: Unbreak LE builds after r329824

The parameter 'p' is unused if _BYTE_ORDER == _LITTLE_ENDIAN. Add in a
(void)p to fix the build.


# 0d787e9b 22-Feb-2018 Wojciech Macek <wma@FreeBSD.org>

NVMe: Add big-endian support

Remove bitfields from defined structures as they are not portable.
Instead use shift and mask macros in the driver and nvmecontrol application.

NVMe is now working on powerpc64 host.

Submitted by: Michal Stanek <mst@semihalf.com>
Obtained from: Semihalf
Reviewed by: imp, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D13916


# 0028abe6 22-Feb-2018 Warner Losh <imp@FreeBSD.org>

Backout r329818, r329816 and r329815.

These aren't the commits I thought I was testing prior to
commit. Revert until I can sort out what happened and fix it.


# 4d87e271 21-Feb-2018 Warner Losh <imp@FreeBSD.org>

Combine BIO_DELETE requests for nda devices

Now that we're queueing BIO_DELETE requests in the CAM I/O scheduler,
it make sense to try to combine as many as possible into a single
request to send down to hardware. Hopefully, lots of larger requests
like this are better than lots of individual transactions.

Note for future: need to limit based on total size of the trim
request. Should also collapse adjacent ranges where possible to
increase the size of the max payload.

Sponsored by: Netflix


# 718cf2cc 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/dev: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 4e3b2744 13-Nov-2017 Warner Losh <imp@FreeBSD.org>

Provide link speed data in XPT_GET_TRAN_SETTINGS. Provide full version
information for that and XPT_PATH_INQ. Provide macros to encode/decode
major/minor versions. Read the link speed and lane count to compute
the base_transfer_speed for XPT_PATH_INQ.

Sponsored by: Netflix


# fa271a5d 15-Oct-2017 Warner Losh <imp@FreeBSD.org>

Closer examination shows that nvme and CAM both normally zero-fill
allocations (for req and ccb, which ultimately contain the
nvme_cmd). As such, we can micro-optimize these routines. Add a
comment to this effect, and bzero the ccb used to make the requests
for the nda dump rotuine so it more closely matches a ccb allocated
with xpt_get_ccb().

Sponsored by: Netflix


# fbed8df2 15-Oct-2017 Warner Losh <imp@FreeBSD.org>

Explicitly set reserved fields and 'fuse' to 0. This prevents us from
acidentally sending bogus values in these fields, which some drives
may reject with an error or worse (undefined behavior).

This is especially needed for the ndadump routine which allocates the
cmd from stack garbage....

Sponsored by: Netflix


# c2005bba 29-Aug-2017 Warner Losh <imp@FreeBSD.org>

Fix a few overlooked spots where the coded uses 16-bit NSIDs. Chuck
Tuffli had submitted a more thorough patch that I was unaware of when
I did my work and this brings in the bits I missed from that patch.

PR: 220267
Submitted by: Chuck Tuffli


# 030edcce 25-Aug-2017 Warner Losh <imp@FreeBSD.org>

Fill in reserved areas from NVMe spec in the IDENTIFY structure
(struct nvme_controller_data) as defined in the NVM Express
specification, revsion 1.3.

Sponsored by: Netflix


# 223a9b93 25-Aug-2017 Warner Losh <imp@FreeBSD.org>

Add feature codes from NVMe 1.3 specification:

o Automomous Power State Transition
o Host Memory Buffer
o Timestamp
o Keep Alive Timer
o Host Controlled Thermal Management
o Non-Operational Power State Config

Also note that feature codes 0x78-0x7f are reserved for the NVMe
Management Interface.

Sponsored by: Netflix


# 0012e436 24-Aug-2017 Warner Losh <imp@FreeBSD.org>

Use _Static_assert

These files are compiled in userland too, so we can't use sys/systm.h
and rely on CTASSERT. Switch to using _Static_assert instead.

MFC After: 3 days
Sponsored by: Netflix


# 0c26c199 24-Aug-2017 Warner Losh <imp@FreeBSD.org>

Sanity check sizes

Add compile time sanity checks to make sure that packed structures are
the proper size, typically as defined in the NVMe standard.


# 8a5d94f9 03-Aug-2017 Warner Losh <imp@FreeBSD.org>

Make nvd vs nda choice boot-time rather than build-time

Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.

Differential Revision: https://reviews.freebsd.org/D11825


# 594ffc03 27-Jun-2017 Warner Losh <imp@FreeBSD.org>

Add new definitions for namespaces.

Sponsored by: Netflix
Submitted by: Matt Williams (via D11330)


# 05ee702a 07-Mar-2017 Warner Losh <imp@FreeBSD.org>

cwd10 takes the low 32-bits and cwd11 takes the upper 32-bits of the
lba. Rather than do a cast to uint64_t, which clang warns might be
unaligned, do the stores 32-bits at a time.

Sponsored by: Netflix


# 0cf14228 19-Nov-2016 Warner Losh <imp@FreeBSD.org>

Implement HGST Log page 0xc1, as documented in the HGST SN100 and
SN150 product manuals. Subpage 0x32 is documented, but not implemented.

Sponsored by: Netflix, Inc


# ab1dd091 19-Nov-2016 Warner Losh <imp@FreeBSD.org>

Print Intel's expanded Temperature log page.

Sponsored by: Netflix, Inc


# d01f26f5 19-Nov-2016 Warner Losh <imp@FreeBSD.org>

Add log pages that Intel SSDs provide. It turns out that many of these
are widely implemented beyond just Intel drives.

Sponsored by: Netflix, Inc


# aea52879 19-Nov-2016 Warner Losh <imp@FreeBSD.org>

Add log pages defined through NVM Express 1.2.1.

Sponsored by: Netflix, Inc


# dc58cdf9 19-Nov-2016 Warner Losh <imp@FreeBSD.org>

Expand the SMART / Health Information Log Page (Page 02) printout
based on NVM Express 1.2.1 Standard.

Sponsored by: Netflix, Inc


# a498975e 18-Jul-2016 Scott Long <scottl@FreeBSD.org>

Implement crashdump support on NVME

MFC after: 3 days
Sponsored by: Netflix, Inc.


# f24c011b 10-Jun-2016 Warner Losh <imp@FreeBSD.org>

Commit the bits of nda that were missed. This should fix the build.

Approved by: re@


# ee7f4d81 10-Mar-2016 Alexander Motin <mav@FreeBSD.org>

Revert r292074 (by smh): Limit stripesize reported from nvd(4) to 4K

I believe that this patch handled the problem from the wrong side.
Instead of making ZFS properly handle large stripe sizes, it made
unrelated driver to lie in reported parameters to workaround that.

Alternative solution for this problem from ZFS side was committed at
r296615.

Discussed with: smh


# 038659e7 30-Jan-2016 Warner Losh <imp@FreeBSD.org>

Implement power command to list all power modes, find out the power
mode we're in and to set the power mode.


# fdf16a68 10-Dec-2015 Steven Hartland <smh@FreeBSD.org>

Limit stripesize reported from nvd(4) to 4K

Intel NVMe controllers have a slow path for I/Os that span a 128KB stripe boundary but ZFS limits ashift, which is derived from d_stripesize, to 13 (8KB) so we limit the stripesize reported to geom(8) to 4KB.

This may result in a small number of additional I/Os to require splitting in nvme(4), however the NVMe I/O path is very efficient so these additional I/Os will cause very minimal (if any) difference in performance or CPU utilisation.

This can be controller by the new sysctl kern.nvme.max_optimal_sectorsize.

MFC after: 1 week
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4446


# fdbd3d80 30-Oct-2015 Jim Harris <jimharris@FreeBSD.org>

nvd, nvme: report stripesize through GEOM disk layer

MFC after: 3 days
Sponsored by: Intel


# 992db80f 08-Oct-2013 Jim Harris <jimharris@FreeBSD.org>

Extend some 32-bit fields and variables to 64-bit to prevent overflow
when calculating stats in nvmecontrol perftest.

Sponsored by: Intel
Reported by: Joe Golio <joseph.golio@emc.com>
Reviewed by: carl
Approved by: re (hrs)
MFC after: 1 week


# a40e72a6 08-Oct-2013 Jim Harris <jimharris@FreeBSD.org>

Add driver-assisted striping for upcoming Intel NVMe controllers that can
benefit from it.

Sponsored by: Intel
Reviewed by: kib (earlier version), carl
Approved by: re (hrs)
MFC after: 1 week


# 56183abc 13-Aug-2013 Jim Harris <jimharris@FreeBSD.org>

Send a shutdown notification in the driver unload path, to ensure
notification gets sent in cases where system shuts down with driver
unloaded.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


# 38441bd9 19-Jul-2013 Jim Harris <jimharris@FreeBSD.org>

Add message when nvd disks are attached and detached.

As part of this commit, add an nvme_strvis() function which borrows
heavily from cam_strvis(). This will allow stripping of
leading/trailing whitespace and also handle unprintable characters
in model/serial numbers. This function goes into a new nvme_util.c
file which is used by both the driver and nvmecontrol.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


# e8f25c62 17-Jul-2013 Jim Harris <jimharris@FreeBSD.org>

Define constants for the lengths of the serial number, model number
and firmware revision in the controller's identify structure.

Also modify consumers of these fields to ensure they only use the
specified number of bytes for their respective fields.

Sponsored by: Intel
Reviewed by: carl
MFC after: 3 days


# 66619178 11-Jul-2013 Jim Harris <jimharris@FreeBSD.org>

Fix a poorly worded comment in nvme(4).

MFC after: 3 days


# e9efbc13 09-Jul-2013 Jim Harris <jimharris@FreeBSD.org>

Update copyright dates.

MFC after: 3 days


# 49fac610 26-Jun-2013 Jim Harris <jimharris@FreeBSD.org>

Add firmware replacement and activation support to nvmecontrol(8) through
a new firmware command.

NVMe controllers may support up to 7 firmware slots for storing of
different firmware revisions. This new firmware command supports
firmware replacement (i.e. firmware download) with or without immediate
activation, or activation of a previously stored firmware image. It
also supports selection of the firmware slot during replacement
operations, using IDENTIFY information from the controller to
check that the specified slot is valid.

Newly activated firmware does not take effect until the new controller
reset, either via a reboot or separate 'nvmecontrol reset' command to the
same controller.

Submitted by: Joe Golio <joseph.golio@emc.com>
Obtained from: EMC / Isilon Storage Division
MFC after: 3 days


# 8d09e3c4 26-Jun-2013 Jim Harris <jimharris@FreeBSD.org>

Use MAXPHYS to specify the maximum I/O size for nvme(4).

Also allow admin commands to transfer up to this maximum I/O size, rather
than the artificial limit previously imposed. The larger I/O size is very
beneficial for upcoming firmware download support. This has the added
benefit of simplifying the code since both admin and I/O commands now use
the same maximum I/O size.

Sponsored by: Intel
MFC after: 3 days


# 5076698e 12-Apr-2013 Jim Harris <jimharris@FreeBSD.org>

Remove the NVME_IDENTIFY_CONTROLLER and NVME_IDENTIFY_NAMESPACE IOCTLs and replace
them with the NVMe passthrough equivalent.

Sponsored by: Intel


# 7c3f19d7 12-Apr-2013 Jim Harris <jimharris@FreeBSD.org>

Add support for passthrough NVMe commands.

This includes a new IOCTL to support a generic method for nvmecontrol(8) to pass
IDENTIFY, GET_LOG_PAGE, GET_FEATURES and other commands to the controller, rather than
separate IOCTLs for each.

Sponsored by: Intel


# 5fdf9c3c 01-Apr-2013 Jim Harris <jimharris@FreeBSD.org>

Add unmapped bio support to nvme(4) and nvd(4).

Sponsored by: Intel


# 232e2edb 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Add the ability to internally mark a controller as failed, if it is unable to
start or reset. Also add a notifier for NVMe consumers for controller fail
conditions and plumb this notifier for nvd(4) to destroy the associated
GEOM disks when a failure occurs.

This requires a bit of work to cover the races when a consumer is sending
I/O requests to a controller that is transitioning to the failed state. To
help cover this condition, add a task to defer completion of I/Os submitted
to a failed controller, so that the consumer will still always receive its
completions in a different context than the submission.

Sponsored by: Intel
Reviewed by: carl


# 0d7e13ec 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Pass associated log page data to async event consumers, if requested.

Sponsored by: Intel
Reviewed by: carl


# 0692579b 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Add structure definitions and controller command function for firmware
log pages.

Sponsored by: Intel
Reviewed by: carl


# 08927782 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Add structure definitions and a controller command function for
error log pages.

Sponsored by: Intel
Reviewed by: carl


# cf81529c 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Create struct nvme_status.

NVMe error log entries include status, so breaking this out into
its own data structure allows it to be included in both the
nvme_completion data structure as well as error log entry data
structures.

While here, expose nvme_completion_is_error(), and change all of
the places that were explicitly looking at sc/sct bits to use this
macro instead.

Sponsored by: Intel
Reviewed by: carl


# dbba7442 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Add API for nvme consumers to access controller and namespace identify data.

Sponsored by: Intel
Reviewed by: carl


# b846efd7 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Add controller reset capability to nvme(4) and ability to explicitly
invoke it from nvmecontrol(8).

Controller reset will be performed in cases where I/O are repeatedly
timing out, the controller reports an unrecoverable condition, or
when explicitly requested via IOCTL or an nvme consumer. Since the
controller may be in such a state where it cannot even process queue
deletion requests, we will perform a controller reset without trying
to clean up anything on the controller first.

Sponsored by: Intel
Reviewed by: carl


# 5f1e251d 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Create a generic nvme_ctrlr_cmd_get_log_page function, and change the
health information log page function to use it.

Sponsored by: Intel


# 99d99f74 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Expose the get/set features API to nvme consumers.

Sponsored by: Intel


# 038a5ee4 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Add an interface for nvme shim drivers (i.e. nvd) to register for
notifications when new nvme controllers are added to the system.

Sponsored by: Intel


# 0a0b08cc 26-Mar-2013 Jim Harris <jimharris@FreeBSD.org>

Enable asynchronous event requests on non-Chatham devices.

Also add logic to clean up all outstanding asynchronous event requests
when resetting or shutting down the controller, since these requests
will not be explicitly completed by the controller itself.

Sponsored by: Intel


# 0f71ecf7 17-Oct-2012 Jim Harris <jimharris@FreeBSD.org>

Add ability to queue nvme_request objects if no nvme_trackers are available.

This eliminates the need to manage queue depth at the nvd(4) level for
Chatham prototype board workarounds, and also adds the ability to
accept a number of requests on a single qpair that is much larger
than the number of trackers allocated.

Sponsored by: Intel


# 9eb93f29 17-Oct-2012 Jim Harris <jimharris@FreeBSD.org>

Add return codes to all functions used for submitting commands to I/O
queues.

Sponsored by: Intel


# be4dcf1b 18-Sep-2012 Jim Harris <jimharris@FreeBSD.org>

Add __aligned(4) to NVMe defined data structures.

This fixes issue in nvmecontrol(8), where clang throws a cast-align
warning when casting a __packed structure pointer to a uint32_t
pointer as part of printing raw hex output.

Reported by: dhw


# bb0ec6b3 17-Sep-2012 Jim Harris <jimharris@FreeBSD.org>

This is the first of several commits which will add NVM Express (NVMe)
support to FreeBSD. A full description of the overall functionality
being added is below. nvmexpress.org defines NVM Express as "an optimized
register interface, command set and feature set fo PCI Express (PCIe)-based
Solid-State Drives (SSDs)."

This commit adds nvme(4) and nvd(4) driver source code and Makefiles
to the tree.

Full NVMe functionality description:
Add nvme(4) and nvd(4) drivers and nvmecontrol(8) for NVM Express (NVMe)
device support.

There will continue to be ongoing work on NVM Express support, but there
is more than enough to allow for evaluation of pre-production NVM Express
devices as well as soliciting feedback. Questions and feedback are welcome.

nvme(4) implements NVMe hardware abstraction and is a provider of NVMe
namespaces. The closest equivalent of an NVMe namespace is a SCSI LUN.
nvd(4) is an NVMe consumer, surfacing NVMe namespaces as GEOM disks.
nvmecontrol(8) is used for NVMe configuration and management.

The following are currently supported:
nvme(4)
- full mandatory NVM command set support
- per-CPU IO queues (enabled by default but configurable)
- per-queue sysctls for statistics and full command/completion queue
dumps for debugging
- registration API for NVMe namespace consumers
- I/O error handling (except for timeoutsee below)
- compilation switches for support back to stable-7

nvd(4)
- BIO_DELETE and BIO_FLUSH (if supported by controller)
- proper BIO_ORDERED handling

nvmecontrol(8)
- devlist: list NVMe controllers and their namespaces
- identify: display controller or namespace identify data in
human-readable or hex format
- perftest: quick and dirty performance test to measure raw
performance of NVMe device without userspace/physio/GEOM
overhead

The following are still work in progress and will be completed over the
next 3-6 months in rough priority order:
- complete man pages
- firmware download and activation
- asynchronous error requests
- command timeout error handling
- controller resets
- nvmecontrol(8) log page retrieval

This has been primarily tested on amd64, with light testing on i386. I
would be happy to provide assistance to anyone interested in porting
this to other architectures, but am not currently planning to do this
work myself. Big-endian and dmamap sync for command/completion queues
are the main areas that would need to be addressed.

The nvme(4) driver currently has references to Chatham, which is an
Intel-developed prototype board which is not fully spec compliant.
These references will all be removed over time.

Sponsored by: Intel
Contributions from: Joe Golio/EMC <joseph dot golio at emc dot com>