History log of /linux-master/include/linux/nvme.h
Revision Date Author Comments
# 25802f3a 23-Jan-2024 Max Gurtovoy <mgurtovoy@nvidia.com>

nvme-rdma: move NVME_RDMA_IP_PORT from common file

The correct place for this definition is the nvme rdma header file and
not the common nvme header file.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 29f69753 05-Feb-2024 Keith Busch <kbusch@kernel.org>

nvme: implement support for relaxed effects

NVM Express TP4167 provides a way for controllers to report a relaxed
execution constraint. Specifically, it notifies of exclusivity for IO
vs. admin commands instead of grouping these together. If set, then we
don't need to freeze IO in order to execute that admin command. The
freezing distrupts IO processes, so it's nice to avoid that if the
controller tells us it's not necessary.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# f9e9115d 31-Jan-2024 Caleb Sander <csander@purestorage.com>

nvme: take const cmd pointer in read-only helpers

nvme_is_fabrics() and nvme_is_write() only read struct nvme_command,
so take it by const pointer. This allows callers to pass a const pointer
and communicates that these functions don't modify the command.

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 41353fba 18-Jan-2024 Guixin Liu <kanie@linux.alibaba.com>

nvmet: unify aer type enum

The host and target use two definition of aer type, unify
them into a single one.

Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 2ad28ce9 31-Dec-2023 Max Gurtovoy <mgurtovoy@nvidia.com>

nvme: remove unused definition

There is no users for NVMF_AUTH_HASH_LEN macro.

Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 75276847 25-Oct-2023 Mark O'Donovan <shiftee@posteo.net>

nvme-auth: auth success1 msg always includes resp

In cases where RVALID is false, the response is still transmitted,
but is cleared to zero.

Relevant extract from the spec:
Response R2, if valid (i.e., if the RVALID field is set to 01h),
cleared to 0h otherwise

Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 646f45b2 24-Aug-2023 Hannes Reinecke <hare@suse.de>

nvme: add TCP TSAS definitions

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# b938e660 23-Jun-2023 Ankit Kumar <ankit.kumar@samsung.com>

nvme: fix the NVME_ID_NS_NVM_STS_MASK definition

As per NVMe command set specification 1.0c Storage tag size is 7 bits.

Fixes: 4020aad85c67 ("nvme: add support for enhanced metadata")
Signed-off-by: Ankit Kumar <ankit.kumar@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# be1a7cd2 07-Apr-2023 Mike Christie <michael.christie@oracle.com>

nvme: Add a nvme_pr_type enum

The next patch adds support to report the reservation type, so we need to
be able to convert from the NVMe PR value we get from the device to the
linux block layer PR value that will be returned to callers. To prepare
for that, this patch adds a nvme_pr_type enum and renames the nvme_pr_type
function.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230407200551.12660-13-michael.christie@oracle.com
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 5fd96a4e 07-Apr-2023 Mike Christie <michael.christie@oracle.com>

nvme: Add pr_ops read_keys support

This patch adds support for the pr_ops read_keys callout by calling the
NVMe Reservation Report helper, then parsing that info to get the
controller's registered keys. Because the callout is only used in the
kernel where the callers, like LIO, do not know about controller/host IDs,
the callout just returns the registered keys which is required by the SCSI
PR in READ KEYS command.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230407200551.12660-12-michael.christie@oracle.com
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# f2bf2e7e 07-Apr-2023 Mike Christie <michael.christie@oracle.com>

nvme: Fix reservation status related structs

This fixes the following issues with the reservation status structs:

1. resv10 is bytes 23:10 so it should be 14 bytes.
2. regctl_ds only supports 64 bit host IDs.

These are not currently used, but will be in this patchset which adds
support for the reservation report command.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230407200551.12660-8-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 8e19b87c 09-Mar-2023 Minwoo Im <minwoo.im.dev@gmail.com>

nvme-trace: show more opcode names

We have more commands to show in the trace. Sync up.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6f99ac04 13-Dec-2022 Christoph Hellwig <hch@lst.de>

nvme: consult the CSE log page for unprivileged passthrough

Commands like Write Zeros can change the contents of a namespaces without
actually transferring data. To protect against this, check the Commands
Supported and Effects log is supported by the controller for any
unprivileg command passthrough and refuse unprivileged passthrough if the
command has any effects that can change data or metadata.

Note: While the Commands Support and Effects log page has only been
mandatory since NVMe 2.0, it is widely supported because Windows requires
it for any command passthrough from userspace.

Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


# 685e6311 21-Dec-2022 Christoph Hellwig <hch@lst.de>

nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition

3 << 16 does not generate the correct mask for bits 16, 17 and 18.
Use the GENMASK macro to generate the correct mask instead.

Fixes: 84fef62d135b ("nvme: check admin passthru command effects")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>


# 1b96f862 30-Oct-2022 Christoph Hellwig <hch@lst.de>

nvme: implement the DEAC bit for the Write Zeroes command

While the specification allows devices to either deallocate data
or to actually write zeroes on any Write Zeroes command, many SSDs
only do the sensible thing and deallocate data when the DEAC bit
is specific. Set it when it is supported and the caller doesn't
explicitly opt out of deallocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>


# 855b7717 31-Oct-2022 Kanchan Joshi <joshi.k@samsung.com>

nvme: fine-granular CAP_SYS_ADMIN for nvme io commands

Currently both io and admin commands are kept under a
coarse-granular CAP_SYS_ADMIN check, disregarding file mode completely.

$ ls -l /dev/ng*
crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1
crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2

In the example above, ng0n1 appears as if it may allow unprivileged
read/write operation but it does not and behaves same as ng0n2.

This patch implements a shift from CAP_SYS_ADMIN to more fine-granular
control for io-commands.
If CAP_SYS_ADMIN is present, nothing else is checked as before.
Otherwise, following rules are in place
- any admin-cmd is not allowed
- vendor-specific and fabric commmand are not allowed
- io-commands that can write are allowed if matching FMODE_WRITE
permission is present
- io-commands that read are allowed

Add a helper nvme_cmd_allowed that implements above policy.
Change all the callers of CAP_SYS_ADMIN to go through nvme_cmd_allowed
for any decision making.
Since file open mode is counted for any approval/denial, change at
various places to keep file-mode information handy.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1c32a801 20-Sep-2022 Christoph Hellwig <hch@lst.de>

nvme: improve the NVME_CONNECT_AUTHREQ* definitions

Mark them as unsigned so that we don't need extra casts, and define
them relative to cdword0 instead of requiring extra shifts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>


# 88b140fe 27-Jun-2022 Hannes Reinecke <hare@suse.de>

nvme: add definitions for NVMe In-Band authentication

Add new definitions for NVMe In-band authentication as defined in
the NVMe Base Specification v2.0.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2c61c97f 08-Jun-2022 Michael Kelley <mikelley@microsoft.com>

nvme: handle the persistent internal error AER

In the NVM Express Revision 1.4 spec, Figure 145 describes possible
values for an AER with event type "Error" (value 000b). For a
Persistent Internal Error (value 03h), the host should perform a
controller reset.

Add support for this error using code that already exists for
doing a controller reset. As part of this support, introduce
two utility functions for parsing the AER type and subtype.

This new support was tested in a lab environment where we can
generate the persistent internal error on demand, and observe
both the Linux side and NVMe controller side to see that the
controller reset has been done.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5c629dc9 06-Jul-2022 Keith Busch <kbusch@kernel.org>

nvme: use struct group for generic command dwords

This will allow the trace event to know the full size of the data
intended to be copied and silence read overflow checks.

Reported-by: John Garry <john.garry@huawei.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 23c9cd56 21-Jun-2022 Joel Granados <j.granados@samsung.com>

nvme: fix the CRIMS and CRWMS definitions to match the spec

Adjust the values of NVME_CAP_CRMS_CRIMS and NVME_CAP_CRMS_CRWMS masks as
they are different from the ones in TP4084 - Time-to-ready.

Fixes: 354201c53e61 ("nvme: add support for TP4084 - Time-to-Ready Enhancements").
Signed-off-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 354201c5 16-May-2022 Christoph Hellwig <hch@lst.de>

nvme: add support for TP4084 - Time-to-Ready Enhancements

Add support for using longer timeouts during controller initialization
and letting the controller come up with namespaces that are not ready
for I/O yet. We skip these not ready namespaces during scanning and
only bring them online once anoter scan is kicked off by the AEN that
is set when the NRDY bit gets set in the I/O Command Set Independent
Identify Namespace Data Structure. This asynchronous probing avoids
blocking the kernel boot when controllers take a very long time to
recover after unclean shutdowns (up to minutes).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>


# e626f37e 16-May-2022 Christoph Hellwig <hch@lst.de>

nvme: split the enum used for various register constants

Instead of having one big enum add one for each register or field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# ca2d8992 27-Apr-2022 Max Gurtovoy <mgurtovoy@nvidia.com>

nvme: add missing status values to verbose logging

Log a few more path related status codes.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5974ea7c 14-Mar-2022 Sungup Moon <sungup.moon@samsung.com>

nvme: allow duplicate NSIDs for private namespaces

A NVMe subsystem with multiple controller can have private namespaces
that use the same NSID under some conditions:

"If Namespace Management, ANA Reporting, or NVM Sets are supported, the
NSIDs shall be unique within the NVM subsystem. If the Namespace
Management, ANA Reporting, and NVM Sets are not supported, then NSIDs:
a) for shared namespace shall be unique; and
b) for private namespace are not required to be unique."

Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec.

Make sure this specific setup is supported in Linux.

Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head")
Signed-off-by: Sungup Moon <sungup.moon@samsung.com>
[hch: refactored and fixed the controller vs subsystem based naming
conflict]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


# 4020aad8 03-Mar-2022 Keith Busch <kbusch@kernel.org>

nvme: add support for enhanced metadata

NVM Express ratified TP 4068 defines new protection information formats.
Implement support for the CRC64 guard tags.

Since the block layer doesn't support variable length reference tags,
driver support for the Storage Tag space is not supported at this time.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Klaus Jensen <its@irrelevant.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220303201312.3255347-9-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 86c2457a 08-Feb-2022 Martin Belanger <martin.belanger@dell.com>

nvme: expose cntrltype and dctype through sysfs

TP8010 introduces the Discovery Controller Type attribute (dctype).
The dctype is returned in the response to the Identify command. This
patch exposes the dctype through the sysfs. Since the dctype depends on
the Controller Type (cntrltype), another attribute of the Identify
response, the patch also exposes the cntrltype as well. The dctype will
only be displayed for discovery controllers.

A note about the naming of this attribute:
Although TP8010 calls this attribute the Discovery Controller Type,
note that the dctype is now part of the response to the Identify
command for all controller types. I/O, Discovery, and Admin controllers
all share the same Identify response PDU structure. Non-discovery
controllers as well as pre-TP8010 discovery controllers will continue
to set this field to 0 (which has always been the default for reserved
bytes). Per TP8010, the value 0 now means "Discovery controller type is
not reported" instead of "Reserved". One could argue that this
definition is correct even for non-discovery controllers, and by
extension, exposing it in the sysfs for non-discovery controllers is
appropriate.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# bd83fe6f 03-Feb-2022 Alan Adamson <alan.adamson@oracle.com>

nvme: add verbose error logging

Improves logging of NVMe errors. If NVME_VERBOSE_ERRORS is configured,
a verbose description of the error is logged, otherwise only status
codes/bits is logged.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
[kch]: fix several nits, cosmetics, and trim down code.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 785d584c 18-Oct-2021 Hannes Reinecke <hare@suse.de>

nvme: add new discovery log page entry definitions

TP8014 adds a new SUBTYPE value and a new field EFLAGS for the
discovery log page entry.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d56ae18f 23-Sep-2021 Max Gurtovoy <mgurtovoy@nvidia.com>

nvmet: use macro definitions for setting cmic value

This makes the code more readable.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e15a8a97 22-Sep-2021 Hannes Reinecke <hare@suse.de>

nvme: add CNTRLTYPE definitions for 'identify controller'

Update the 'identify controller' structure to define the newly added
CNTRLTYPE field.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8cf486e1 15-Jun-2021 Wesley Sheng <wesley.sheng@amd.com>

nvme.h: add missing nvme_lba_range_type endianness annotations

Signed-off-by: Wesley Sheng <wesley.sheng@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# aaf2e048 09-Jun-2021 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvmet: add ZBD over ZNS backend support

NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for NVMe
protocol-based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are generic zoned block devices like Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend support for non-ZNS zoned block devices as
NVMeOF targets.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as NVMe
Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With the new command set identifier, we also update the target command
effects logs to reflect the ZNS compliant commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ab5d0b38 09-Jun-2021 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvmet: add Command Set Identifier support

NVMe TP 4056 allows controllers to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is a value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controllers
need to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
support the default command set identifier. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
that represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend for NVMeOF
with host side NVMe ZNS interface, since ZNS commands belong to
the different command set than the default one.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5befc7c2 24-Mar-2021 Keith Busch <kbusch@kernel.org>

nvme: implement non-mdts command limits

Commands that access LBA contents without a data transfer between the
host historically have not had a spec defined upper limit. The driver
set the queue constraints for such commands to the max data transfer
size just to be safe, but this artificial constraint frequently limits
devices below their capabilities.

The NVMe Workgroup ratified TP4040 defines how a controller may
advertise their non-MDTS limits. Use these if provided and default to
the current constraints if not. Since the Dataset Management command
limits are defined in logical blocks, but without a namespace to tell us
the logical block size, the code defaults to the safe 512b size.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4a407d5e 26-Jan-2021 Johannes Thumshirn <johannes.thumshirn@wdc.com>

nvme: add tracing of zns commands

When support for the NVMe ZNS commands was merged, tracing of these has
been omitted.

Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as
nvme_cmd_zone_append to the nvme driver's tracing facility.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3254899e 21-Jan-2021 Max Gurtovoy <mgurtovoy@nvidia.com>

nvme: update enumerations for status codes

All the updates are mentioned in the ratified NVMe 1.4 spec.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 20d3bb92 14-Jan-2021 Klaus Jensen <k.jensen@samsung.com>

nvme-pci: allow use of cmb on v1.4 controllers

Since NVMe v1.4 the Controller Memory Buffer must be explicitly enabled
by the host.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
[hch: avoid a local variable and add a comment]
Signed-off-by: Christoph Hellwig <hch@lst.de>


# c1fef73f 24-Jul-2020 Logan Gunthorpe <logang@deltatee.com>

nvmet: add passthru code to process commands

Add passthru command handling capability for the NVMeOF target and
export passthru APIs which are used to integrate passthru
code with nvmet-core.

The new file passthru.c handles passthru cmd parsing and execution.
In the passthru mode, we create a block layer request from the nvmet
request and map the data on to the block layer request.

Admin commands and features are on an allow list as there are a number
of each that don't make too much sense with passthrough. We use an
allow list such that new commands can be considered before being blindly
passed through. In both cases, vendor specific commands are always
allowed.

We also reject reservation IO commands as the underlying device cannot
differentiate between multiple hosts behind a fabric.

Based-on-a-patch-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 240e6ee2 29-Jun-2020 Keith Busch <keith.busch@wdc.com>

nvme: support for zoned namespaces

Add support for NVM Express Zoned Namespaces (ZNS) Command Set defined
in NVM Express TP4053. Zoned namespaces are discovered based on their
Command Set Identifier reported in the namespaces Namespace
Identification Descriptor list. A successfully discovered Zoned
Namespace will be registered with the block layer as a host managed
zoned block device with Zone Append command support. A namespace that
does not support append is not supported by the driver.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <keith.busch@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# be93e87e 29-Jun-2020 Keith Busch <keith.busch@wdc.com>

nvme: support for multiple Command Sets Supported and Effects log pages

The Commands Supported and Effects log page was extended with a CSI
field that enables the host to query the log page for each command set
supported. Retrieve this log page for each command set that an attached
namespace supports, and save a pointer to that log in the namespace head.

Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <keith.busch@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 71010c30 29-Jun-2020 Niklas Cassel <niklas.cassel@wdc.com>

nvme: implement multiple I/O Command Set support

Implements support for multiple I/O Command Sets. NVMe TP 4056
introduces a method to enumerate multiple command sets per namespace. If
the command set is exposed, this method for enumeration will be used
instead of the traditional method that uses the CC.CSS register command
set register for command set identification.

For namespaces where the Command Set Identifier is not supported or
recognized, the specific namespace will not be created.

Reviewed-by: Javier González <javier.gonz@samsung.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Matias Bjørling <matias.bjorling@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 39481fbd 19-May-2020 Israel Rukshin <israelr@mellanox.com>

nvme: add Metadata Capabilities enumerations

The enumerations will be used to expose the namespace metadata format by
the target.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f1e71d75 07-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

nvme: replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 92decf11 03-Apr-2020 Keith Busch <kbusch@kernel.org>

nvme: define constants for identification values

Improve code readability by defining the specification's constants that
the driver is using when decoding identification payloads.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Bart van Assche <bvanassche@acm.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d02abd19 04-May-2020 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvmet: align addrfam list to spec

With reference to the NVMeOF Specification (page 44, Figure 38)
discovery log page entry provides address family field. We do set the
transport type field but the adrfam field is not set when using loop
transport and also it doesn't have support in the nvme-cli. So when
reading discovery log page with a loop transport it leads to confusing
output.

As per the spec for adrfam value 254 is reserved for Intra Host
Transport i.e. loopback), we add a required macro in the protocol
header file, set default port disc addr entry's adrfam to
NVMF_ADDR_FAMILY_MAX, and update nvmet_addr_family configfs array for
show/store attribute.

Without this patch, setting adrfam to (ipv4/ipv6/ib/fc/loop/" ") we get
following output for nvme discover command from nvme-cli which is
confusing.
trtype: loop
adrfam: ipv4
trtype: loop
adrfam: ipv6
trtype: loop
adrfam: infiniband
trtype: loop
adrfam: fibre-channel
trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam: pci # <----- pci for loop
trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = " "
adrfam: pci # <----- pci for unrecognized

This patch fixes above output :-
trtype: loop
adrfam: ipv4
trtype: loop
adrfam: ipv6
trtype: loop
adrfam: infiniband
trtype: loop
adrfam: fibre-channel
trtype: loop # ${CFGFS_HOME}/nvmet/ports/1/addr_adrfam = loop
adrfam: loop # <----- loop for loop
trtype: loop # ${CFGFS_HOME}/config/nvmet/ports/adrfam = " "
adrfam: unrecognized # <----- unrecognized when invalid value

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 52deba0f 14-Nov-2019 Akinobu Mita <akinobu.mita@gmail.com>

nvme: hwmon: provide temperature min and max values for each sensor

According to the NVMe specification, the over temperature threshold and
under temperature threshold features shall be implemented for Composite
Temperature if a non-zero WCTEMP field value is reported in the Identify
Controller data structure. The features are also implemented for all
implemented temperature sensors (i.e., all Temperature Sensor fields that
report a non-zero value).

This provides the over temperature threshold and under temperature
threshold for each sensor as temperature min and max values of hwmon
sysfs attributes.

The WCTEMP is already provided as a temperature max value for Composite
Temperature, but this change isn't incompatible. Because the default
value of the over temperature threshold for Composite Temperature is
the WCTEMP.

Now the alarm attribute for Composite Temperature indicates one of the
temperature is outside of a temperature threshold. Because there is only
a single bit in Critical Warning field that indicates a temperature is
outside of a threshold.

Example output from the "sensors" command:

nvme-pci-0100
Adapter: PCI adapter
Composite: +33.9°C (low = -273.1°C, high = +69.8°C)
(crit = +79.8°C)
Sensor 1: +34.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 5: +47.9°C (low = -273.1°C, high = +65261.8°C)

This also adds helper macros for kelvin from/to milli Celsius conversion,
and replaces the repeated code in hwmon.c.

Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Jean Delvare <jdelvare@suse.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 48c9e85b 14-Oct-2019 Revanth Rajashekar <revanth.rajashekar@intel.com>

nvme: resync include/linux/nvme.h with nvmecli

Update enumerations and structures in include/linux/nvme.h
to resync with the nvmecli.

All the updates are mentioned in the ratified NVMe 1.4 spec
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2dc3947b 13-Oct-2019 Max Gurtovoy <maxg@mellanox.com>

nvme: introduce "Command Aborted By host" status code

Fix the status code of canceled requests initiated by the host according
to TP4028 (Status Code 0x371):
"Command Aborted By host: The command was aborted as a result of host
action (e.g., the host disconnected the Fabric connection)."

Also in a multipath environment, unless otherwise specified, errors of
this type (path related) should be retried using a different path, if
one is available.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c1e0cc7e 07-Aug-2019 Benjamin Herrenschmidt <benh@kernel.crashing.org>

nvme-pci: Add support for variable IO SQ element size

The size of a submission queue element should always be 6 (64 bytes)
by spec.

However some controllers such as Apple's are not properly implementing
the standard and require a different size.

This provides the ground work for the subsequent quirks for these
controllers.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# a5ef7572 04-Aug-2019 Minwoo Im <minwoo.im.dev@gmail.com>

nvme: trace: support for Get LBA Status opcode parsed

This patch adds Get LBA Status command's opcode to the macro that is
used by the trace feature. Now we can see "get_lba_status" instead of
the opcode value itself.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# c6389845 04-Aug-2019 Minwoo Im <minwoo.im.dev@gmail.com>

nvme: add Get LBA Status command opcode

NVMe 1.4 added Get LBA Status command with opcode 0x86.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 6605bdd5 28-Jun-2019 Bart Van Assche <bvanassche@acm.org>

nvme: add I/O characteristics fields

Several new fields have been introduced in version 1.4 of the NVMe spec
at offsets that were defined as reserved in version 1.3d of the NVMe
spec. Update the definition of the nvme_id_ns data structure such that
it is in sync with version 1.4 of the NVMe spec. This change preserves
backwards compatibility.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ad795e47 12-Jun-2019 Minwoo Im <minwoo.im.dev@gmail.com>

nvme-trace: support for fabrics commands in host-side

This patch introduces fabrics commands tracing feature from host-side.
This patch does not include any changes for the previous host-side
tracing, but just add fabrics commands parsing in cmd=() format.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
[hch: fixed some whitespace damage]
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 26f2990d 12-Jun-2019 Minwoo Im <minwoo.im.dev@gmail.com>

nvme-trace: move opcode symbol print to nvme.h

The following patches are going to provide the target-side trace which
might need these kind of macros. It would be great if it can be shared
between host and target side both.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 7a1f46e3 05-Jun-2019 Minwoo Im <minwoo.im.dev@gmail.com>

nvme: introduce nvme_is_fabrics to check fabrics cmd

This patch introduces a nvme_is_fabrics() inline function to check
whether or not the given command structure is for fabrics.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9581ae4f 11-May-2019 Minwoo Im <minwoo.im.dev@gmail.com>

nvme: fix typos in nvme status code values

Fix typos in enumeration names for nvme status:

s/ACIVATE/ACTIVATE/
s/INSUFFICENT/INSUFFICIENT/

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d808b7f7 09-Apr-2019 Keith Busch <kbusch@kernel.org>

nvmet: fix discover log page when offsets are used

The nvme target hadn't been taking the Get Log Page offset parameter
into consideration, and so has been returning corrupted log pages when
offsets are used. Since many tools, including nvme-cli, split the log
request to 4k, we've been breaking discovery log responses when more
than 3 subsystems exist.

Fix the returned data by internally generating the entire discovery
log page and copying only the requested bytes into the user buffer. The
command log page offset type has been modified to a native __le64 to
make it easier to extract the value from a command.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Tested-by: Minwoo Im <minwoo.im@samsung.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# fadccd8f 18-Feb-2019 Christoph Hellwig <hch@lst.de>

nvme_ioctl.h: remove duplicate GPL boilerplate

We already have a ЅPDX header, so no need to duplicate the information.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


# b34de7ce 12-Dec-2018 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme: add error log page slot definition

This patch adds the NVMe error slot definition from the spec.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# b7c8f366 12-Dec-2018 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme: remove nvme_common command cdw10 array

This is a preparation patch which removes the nvme common command cdw10
array and replace with individual fields. This is needed for the nvmet
error log page implementation make is error log page entry offset
assignment easier.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# fc221d05 03-Dec-2018 Sagi Grimberg <sagi@lightbitslabs.com>

nvme-tcp: Add protocol header

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 49cd84b6 27-Nov-2018 Keith Busch <kbusch@kernel.org>

nvme: implement Enhanced Command Retry

A controller may have an internal state that is not able to successfully
process commands for a short duration. In such states, an immediate
command requeue is expected to fail. The driver may exceed its max
retry count, which permanently ends the command in failure when the same
command would succeed after waiting for the controller to be ready.

NVMe ratified TP 4033 provides a delay hint in the completion status
code for failed commands. Implement the retry delay based on the command
completion status and the controller's requested delay.

Note that requeued commands are handled per request_queue, not per
individual request. If multiple commands fail, the controller should
consistently report the desired delay time for retryable commands in
all CQEs, otherwise the requeue list may be kicked too soon.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9b95d2fb 20-Nov-2018 Sagi Grimberg <sagi@grimberg.me>

nvmet: expose support for fabrics SQ flow control disable in treq

Technical Proposal introduces an indication for SQ flow control
disable support. Expose it since we are able to operate in this mode.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0445e1b5 19-Nov-2018 Sagi Grimberg <sagi@grimberg.me>

nvmet: don't override treq upon modification.

Only override the allowed parts of it.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
[hch: slight tweak to the NVME_TREQ_SECURE_CHANNEL_MASK definition]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e6a622fd 19-Nov-2018 Sagi Grimberg <sagi@grimberg.me>

nvmet: support fabrics sq flow control

Technical proposal 8005 "fabrics SQ flow control" introduces a mode
where a host and controller agree to omit sq_head pointer updates
when sending nvme completions.

In case the host indicated desire to operate in this mode (connect attribute)
the controller will return back a connect completion with sq_head value
of 0xffff as indication that it will omit sq_head pointer updates.

This mode saves us an atomic update in the I/O path.

Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: suggested better implementation]
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f301c2b1 12-Nov-2018 Jay Sternberg <jay.e.sternberg@intel.com>

nvmet: add defines for discovery change async events

Add AEN/AER values as defined by the specification

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7114ddeb 12-Nov-2018 Jay Sternberg <jay.e.sternberg@intel.com>

nvmet: change aen mask functions to use bit numbers

Functions nvmet_aen_disabled and nvmet_clear_aen were using
values not bit numbers ie 1 << 9 not 9 for bit function clear_bit
and test_and_set_bit.

Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com>
Reviewed-by: Phil Cayton <phil.cayton@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6e3ca03e 02-Nov-2018 Sagi Grimberg <sagi@grimberg.me>

nvme: support traffic based keep-alive

If the controller supports traffic based keep alive, we restart the keep
alive timer if any admin or io commands was completed during the kato
period. This prevents a possible starvation of keep alive commands in
the presence of heavy traffic as in such case, we already have a health
indication from the host perspective.

Only set a comp_seen indicator in case the controller supports keep
alive to minimize the overhead for pci controllers.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 12b21171 02-Nov-2018 Sagi Grimberg <sagi@grimberg.me>

nvme: introduce ctrl attributes enumeration

We are growing more controller attributes, so use a proper enumeration
for it. For now just add the 128-bit hostid which we support.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 783f4a44 27-Sep-2018 James Smart <jsmart2021@gmail.com>

nvme: call nvme_complete_rq when nvmf_check_ready fails for mpath I/O

When an io is rejected by nvmf_check_ready() due to validation of the
controller state, the nvmf_fail_nonready_command() will normally return
BLK_STS_RESOURCE to requeue and retry. However, if the controller is
dying or the I/O is marked for NVMe multipath, the I/O is failed so that
the controller can terminate or so that the io can be issued on a
different path. Unfortunately, as this reject point is before the
transport has accepted the command, blk-mq ends up completing the I/O
and never calls nvme_complete_rq(), which is where multipath may preserve
or re-route the I/O. The end result is, the device user ends up seeing an
EIO error.

Example: single path connectivity, controller is under load, and a reset
is induced. An I/O is received:

a) while the reset state has been set but the queues have yet to be
stopped; or
b) after queues are started (at end of reset) but before the reconnect
has completed.

The I/O finishes with an EIO status.

This patch makes the following changes:

- Adds the HOST_PATH_ERROR pathing status from TP4028
- Modifies the reject point such that it appears to queue successfully,
but actually completes the io with the new pathing status and calls
nvme_complete_rq().
- nvme_complete_rq() recognizes the new status, avoids resetting the
controller (likely was already done in order to get this new status),
and calls the multipather to clear the current path that errored.
This allows the next command (retry or new command) to select a new
path if there is one.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 93045d59 08-Aug-2018 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme.h: add support for ns write protect definitions

Add various definitions from NVMe 1.3 TP 4005.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8b92d0e3 08-Aug-2018 Hannes Reinecke <hare@suse.com>

nvme.h: fixup ANA group descriptor format

ANA Phase 3 draft had the 'reserved' field in the group descriptor
format set to '23:17' (so that the first namespace identifier started
at byte 24), but that got move with the approved TP to '31:17'
(so that the first namespace identifier started at byte 32).

Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1a376216 13-May-2018 Christoph Hellwig <hch@lst.de>

nvme.h: add ANA definitions

Add various defintions from NVMe 1.3 TP 4004.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 9b89bc38 12-May-2018 Christoph Hellwig <hch@lst.de>

nvme.h: add support for the log specific field

NVMe 1.3 added a new log specific field to the get log page CQ
defintion, add it to our get_log_page SQ structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 40c6f9c2 15-Jun-2018 Revanth Rajashekar <revanth.rajashekar@intel.com>

nvme.h: resync with nvme-cli

Added some feature ids present in nvme-cli but not kernel.

Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# aafd3afe 25-May-2018 Hannes Reinecke <hare@suse.com>

nvme.h: add AEN configuration symbols

Signed-off-by: Hannes Reinecke <hare@suse.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# b3984e06 25-May-2018 Christoph Hellwig <hch@lst.de>

nvme.h: add the changed namespace list log

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 868c2392 22-May-2018 Christoph Hellwig <hch@lst.de>

nvme.h: untangle AEN notice definitions

Stop including the event type in the definitions for the notice type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 88de4598 20-Dec-2017 Christoph Hellwig <hch@lst.de>

nvme-pci: clean up SMBSZ bit definitions

Define the bit positions instead of macros using the magic values,
and move the expanded helpers to calculate the size and size unit into
the implementation C file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# e3d7874d 07-Nov-2017 Keith Busch <kbusch@kernel.org>

nvme: send uevent for some asynchronous events

This will give udev a chance to observe and handle asynchronous event
notifications and clear the log to unmask future events of the same type.
The driver will create a change uevent of the asyncronuos event result
before submitting the next AEN request to the device if a completed AEN
event is of type error, smart, command set or vendor specific,

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 38dabe21 07-Nov-2017 Keith Busch <kbusch@kernel.org>

nvme: centralize AEN defines

All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 84fef62d 07-Nov-2017 Keith Busch <kbusch@kernel.org>

nvme: check admin passthru command effects

The NVMe standard provides a command effects log page so the host may
be aware of special requirements it may need to do for a particular
command. For example, the command may need to run with IO quiesced to
prevent timeouts or undefined behavior, or it may change the logical block
formats that determine how the host needs to construct future commands.

This patch saves the nvme command effects log page if the controller
supports it, and performs appropriate actions before and after an admin
passthrough command is completed. If the controller does not support the
command effects log page, the driver will define the effects for known
opcodes. The nvme format and santize are the only commands in this patch
with known effects.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d85cf207 07-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme: add transport SGL definitions

Add transport SGL defintions from NVMe TP 4008, required for
the final NVMe-FC standard.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c98cb3bd 07-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme.h: remove FC transport-specific error values

The NVM express group recinded the reserved range for the transport.
Remove the FC-centric values that had been defined.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 044a9df1 10-Sep-2017 Christoph Hellwig <hch@lst.de>

nvme-pci: implement the HMB entry number and size limitations

Adds support for the new Host Memory Buffer Minimum Descriptor Entry Size
and Host Memory Maximum Descriptors Entries field that were added in
TP 4002 HMB Enhancements. These allow the controller to advertise
limits for the usual number of segments in the host memory buffer, as
well as a minimum usable per-segment size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>


# 223694b9 30-Aug-2017 Changpeng Liu <changpeng.liu@intel.com>

nvme: fix the definition of the doorbell buffer config support bit

NVMe 1.3 specification defines the Optional Admin Command Support feature
flags, bit 8 set to '1' then the controller supports the Doorbell Buffer
Config command. Bit 7 is used for Virtualization Mangement command.

Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Cc: stable@vger.kernel.org


# 60b43f62 13-Aug-2017 Max Gurtovoy <maxg@mellanox.com>

nvme: rename AMS symbolic constants to fit specification

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ad4e05b2 13-Aug-2017 Max Gurtovoy <maxg@mellanox.com>

nvme: add symbolic constants for CC identifiers

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# dbf86b39 16-Aug-2017 Jon Derrick <jonathan.derrick@intel.com>

nvme: add support for NVMe 1.3 Timestamp Feature

NVME's Timestamp feature allows controllers to be aware of the epoch
time in milliseconds. This patch adds the set features hook for various
transports through the identify path, so that resets and resumes can
update the controller as necessary.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
[hch: rebased on top of nvme-4.13 error handling changes,
changed nvme_configure_timestamp to return the status]
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 62346eae 12-Jul-2017 Arnav Dawn <a.dawn@samsung.com>

nvme: define NVME_NSID_ALL

Define the constant "0xffffffff" (used as nsid for all namespaces)
as NVME_NSID_ALL.

Signed-off-by: Arnav Dawn <a.dawn@samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# b6dccf7f 12-Jul-2017 Arnav Dawn <a.dawn@samsung.com>

nvme: add support for FW activation without reset

This patch adds support for handling Fw activation without reset
On completion of FW-activation-starting AER, all queues are
paused till CSTS.PP is cleared or timed out (exceeds max time for
fw activtion MTFA). If device fails to clear CSTS.PP within MTFA,
driver issues reset controller.

Signed-off-by: Arnav Dawn <a.dawn@samsung.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 2fd4167f 12-Jul-2017 Jon Derrick <jonathan.derrick@intel.com>

nvme: fabrics commands should use the fctype field for data direction

Fabrics commands with opcode 0x7F use the fctype field to indicate data
direction.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Sagi Grimberg <sai@grmberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: eb793e2c ("nvme.h: add NVMe over Fabrics definitions")


# dc1a0afb 14-Jul-2017 Christoph Hellwig <hch@lst.de>

nvme: fix byte swapping in the streams code

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7aa1f427 18-Jun-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: use a single NVME_AQ_DEPTH and relax it to 32

No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f5d11840 27-Jun-2017 Jens Axboe <axboe@kernel.dk>

nvme: add support for streams and directives

This adds support for Directives in NVMe, particular for the Streams
directive. Support for Directives is a new feature in NVMe 1.3. It
allows a user to pass in information about where to store the data, so
that it the device can do so most effiently. If an application is
managing and writing data with different life times, mixing differently
retentioned data onto the same locations on flash can cause write
amplification to grow. This, in turn, will reduce performance and life
time of the device.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6b8190d6 15-Jun-2017 Scott Bauer <scott.bauer@intel.com>

nvme: implement NS Optimal IO Boundary from 1.3 Spec

The NVMe 1.3 spec introduces Namespace Optimal IO Boundaries (NOIOB),
which standardizes the stripe mechanism we currently have quirks for.
This patch implements the necessary logic to handle this new feature.

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 435e8090 12-Jun-2017 Guan Junxiong <guanjunxiong@huawei.com>

nvme: add fields into identify controller data structure

Add the new to NVMe 1.3 fields EDSTT, DSTO, FWUG, HCTMA, MNTMT, MXTMT,
and SANICAP into the idenfity controller data structure.

Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# c61d788b 07-Jun-2017 Johannes Thumshirn <jthumshirn@suse.de>

nvmet: allow overriding the NVMe VS via configfs

Allow overriding the announced NVMe Version of a via configfs.

This is particularly helpful when debugging new features for the host
or target side without bumping the hard coded version (as the target
might not be fully compliant to the announced version yet).

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# af8b86e9 07-Jun-2017 Johannes Thumshirn <jthumshirn@suse.de>

nvme: introduce NVMe Namespace Identification Descriptor structures

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 0add5e8e 07-Jun-2017 Johannes Thumshirn <jthumshirn@suse.de>

nvmet: use NVME_IDENTIFY_DATA_SIZE

Use NVME_IDENTIFY_DATA_SIZE define instead of hard coding the magic
4096 value.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: converted three more users]
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 97f6ef64 24-May-2017 Xu Yu <yu.a.xu@intel.com>

nvme-pci: remap BAR0 to cover admin CQ doorbell for large stride

The existing driver initially maps 8192 bytes of BAR0 which is
intended to cover doorbells of admin SQ and CQ. However, if a
large stride, e.g. 10, is used, the doorbell of admin CQ will
be out of 8192 bytes. Consequently, a page fault will be raised
when the admin CQ doorbell is accessed in nvme_configure_admin_queue().

This patch fixes this issue by remapping BAR0 before accessing
admin CQ doorbell if the initial mapping is not enough.

Signed-off-by: Xu Yu <yu.a.xu@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# b85cf734 12-May-2017 Arnav Dawn <a.dawn@samsung.com>

nvme.h: add dword 12 - 15 fields to struct nvme_features

Signed-off-by: Arnav Dawn <a.dawn@samsung.com>
[hch: split from a larger patch, new changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 39673e19 09-Jan-2017 Christoph Hellwig <hch@lst.de>

nvme.h: add struct nvme_host_mem_buf_desc and HMB flags

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 8e412263 17-May-2017 Christoph Hellwig <hch@lst.de>

nvme: switch to uuid_t

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# f9f38e33 09-Apr-2017 Helen Koike <helen.koike@collabora.co.uk>

nvme: improve performance for virtual NVMe devices

This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
1) to store the doorbell values so they can be lookup without the doorbell
MMIO write
2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specification, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.

FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).

The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.

Contributions:
Eric Northup <digitaleric@google.com>
Frank Swiderski <fes@google.com>
Ted Tso <tytso@mit.edu>
Keith Busch <keith.busch@intel.com>

Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.

[1] Running on a 4 core machine:
fio --time_based --name=benchmark --runtime=30
--filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32
--direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4
--rw=randread --blocksize=4k --randrepeat=false

Signed-off-by: Rob Nelson <rlnelson@google.com>
[mlin: port for upstream]
Signed-off-by: Ming Lin <mlin@kernel.org>
[koike: updated for upstream]
Signed-off-by: Helen Koike <helen.koike@collabora.co.uk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>


# bf17aa36 01-Mar-2017 Roland Dreier <roland@purestorage.com>

nvme: Correct NVMF enum values to match NVMe-oF rev 1.0

The enum values for QPTYPE, PRTYPE and CMS are off by 1 from the
values defined in figure 42 of the NVM Express over Fabrics 1.0:

http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605-1.pdf

Fix our enums to match the final spec.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# c5552fde 07-Feb-2017 Andy Lutomirski <luto@kernel.org>

nvme: Enable autonomous power state transitions

NVMe devices can advertise multiple power states. These states can
be either "operational" (the device is fully functional but possibly
slow) or "non-operational" (the device is asleep until woken up).
Some devices can automatically enter a non-operational state when
idle for a specified amount of time and then automatically wake back
up when needed.

The hardware configuration is a table. For each state, an entry in
the table indicates the next deeper non-operational state, if any,
to autonomously transition to and the idle time required before
transitioning.

This patch teaches the driver to program APST so that each successive
non-operational state will be entered after an idle time equal to 100%
of the total latency (entry plus exit) associated with that state.
The maximum acceptable latency is controlled using dev_pm_qos
(e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
states with total latency greater than this value will not be used.
As a special case, setting the latency tolerance to 0 will disable
APST entirely. On hardware without APST support, the sysfs file will
not be exposed.

The latency tolerance for newly-probed devices is set by the module
parameter nvme_core.default_ps_max_latency_us.

In theory, the device can expose "default" APST table, but this
doesn't seem to function correctly on my device (Samsung 950), nor
does it seem particularly useful. There is also an optional
mechanism by which a configuration can be "saved" so it will be
automatically loaded on reset. This can be configured from
userspace, but it doesn't seem useful to support in the driver.

On my laptop, enabling APST seems to save nearly 1W.

The hardware tables can be decoded in userspace with nvme-cli.
'nvme id-ctrl /dev/nvmeN' will show the power state table and
'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
configuration.

This feature is quirked off on a known-buggy Samsung device.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 986994a2 26-Jan-2017 Parav Pandit <parav@mellanox.com>

nvme: Use CNS as 8-bit field and avoid endianness conversion

This patch defines CNS field as 8-bit field and avoids cpu_to/from_le
conversions.
Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS
for readability (don't rely on the fact that NVME_ID_CNS_NS = 0).

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8a9ae523 17-Feb-2017 Scott Bauer <scott.bauer@intel.com>

nvme: Check for Security send/recv support before issuing commands.

We need to verify that the controller supports the security
commands before actually trying to issue them.

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
[hch: moved the check so that we don't call into the OPAL code if not
supported]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# b35ba01e 08-Feb-2017 Christoph Hellwig <hch@lst.de>

nvme: support ranged discard requests

NVMe supports up to 256 ranges per DSM command, so wire up support
for ranged discards up to that limit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# cba3bdfd 02-Dec-2016 James Smart <jsmart2021@gmail.com>

nvme-fabrics: Add FC transport error codes to nvme.h

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3b7c33b2 30-Nov-2016 Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>

nvme.h: add Write Zeroes definitions

Add the command structure, optional command set support (ONCS) bit and
a new error code for the Write Zeroes command.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# d49187e9 10-Nov-2016 Christoph Hellwig <hch@lst.de>

nvme: introduce struct nvme_request

This adds a shared per-request structure for all NVMe I/O. This structure
is embedded as the first member in all NVMe transport drivers request
private data and allows to implement common functionality between the
drivers.

The first use is to replace the current abuse of the SCSI command
passthrough fields in struct request for the NVMe command passthrough,
but it will grow a field more fields to allow implementing things
like common abort handlers in the future.

The passthrough commands are handled by having a pointer to the SQE
(struct nvme_command) in struct nvme_request, and the union of the
possible result fields, which had to be turned from an anonymous
into a named union for that purpose. This avoids having to pass
a reference to a full CQE around and thus makes checking the result
a lot more lightweight.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 329dd768 30-Sep-2016 Christoph Hellwig <hch@lst.de>

nvme.h: add an enum for cns values

Ported over from nvme-cli.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8d63687a 30-Sep-2016 Christoph Hellwig <hch@lst.de>

nvme.h: don't use uuid_be

This makes life easier for nvme-cli and we don't really need the uuid
type anyway to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a446c084 30-Sep-2016 Christoph Hellwig <hch@lst.de>

nvme.h: resync with nvme-cli

Import a few updates to nvme.h from nvme-cli. This mostly includes a few
new fields and error codes, but also a few renames that so far are only
used in user space. Also one field is moved from an array of two le64
values to one of 16 u8 values so that we can more easily access it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8ef2074d 19-Oct-2016 Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>

nvme: Add tertiary number to NVME_VS

NVMe 1.2.1 specification adds a tertiary element to the version number.
This updates the macro and its callers to include the final number and
fixup a single place in nvmet where the version was generated manually.

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7a665d2f 28-Jun-2016 Daniel Verkamp <daniel.verkamp@intel.com>

nvme-fabrics: change NQN UUID to big-endian format

NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the
UUID format of NQN uses a UUID based on RFC 4122.

RFC 4122 specifies that the UUID is encoded in big-endian byte order.

Switch the NVMe over Fabrics host ID field from little-endian UUID to
big-endian UUID to match the specification.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 7b89eae2 13-Jun-2016 Sagi Grimberg <sagi@grimberg.me>

nvme.h: Add keep-alive opcode and identify controller attribute

KAS: keep-alive support and granularity of kato in units of 100 ms
nvme_admin_keep_alive opcode: 0x18

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# eb793e2c 13-Jun-2016 Christoph Hellwig <hch@lst.de>

nvme.h: add NVMe over Fabrics definitions

The NVMe over Fabrics specification defines a protocol interface and
related extensions to NVMe that enable operation over network protocols.
The NVMe over Fabrics specification has an NVMe Transport binding for
each NVMe Transport.

This patch adds the fabrics related definitions:
- fabric specific command set and error codes
- transport addressing and binding definitions
- fabrics sgl extensions
- controller identification fabrics enhancements
- discovery log page definition

Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7a5abb4b 06-Jun-2016 Christoph Hellwig <hch@lst.de>

nvme: factor out a add nvme_is_write helper

Centralize the check if a given NVMe command reads or writes data.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 3972be23 06-Jun-2016 James Smart <james.smart@broadcom.com>

nvme.h: add constants for PSDT and FUSE values

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 79f370ea 06-Jun-2016 Christoph Hellwig <hch@lst.de>

nvme.h: add AER constants

Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 69cd27e2 06-Jun-2016 Christoph Hellwig <hch@lst.de>

nvme.h: add NVM command set SQE/CQE size defines

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 725b3588 06-Jun-2016 Armen Baloyan <armenx.baloyan@intel.com>

nvme.h: Add get_log_page command strucure

Add get_log_page command structure and a corresponding entry in
nvme_command union

Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Reviewed--by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 14e974a8 06-Jun-2016 Christoph Hellwig <hch@lst.de>

nvme.h: add RTD3R, RTD3E and OAES fields

These have been added in NVMe 1.2 and we'll need at least oaes for the
NVMe target driver.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a5b714ad 27-Apr-2016 Wang Sheng-Hui <shhuiw@foxmail.com>

NVMe: correct comment for offset enum of controller registers in nvme.h

Section 3.1 gives the comment for the offset of controller registers
in the specification 1.2a.

Some are mis-copied in the header file nvme.h. Correct them.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7a67cbea 20-Nov-2015 Christoph Hellwig <hch@lst.de>

nvme: use offset instead of a struct for registers

This makes life easier for future non-PCI drivers where access to the
registers might be more complicated. Note that Linux drivers are
pretty evenly split between the two versions, and in fact the NVMe
driver already uses offsets for the doorbells.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
[Fixed CMBSZ offset]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2812dfe3 09-Oct-2015 Christoph Hellwig <hch@lst.de>

nvme: include <linux/types.ĥ> in <linux/nvme.h>

The buildbot complains about this even if it doesn't generate
a a build warning. But it's an easy fix, so here we go:

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 08c69640 02-Oct-2015 Christoph Hellwig <hch@lst.de>

nvme.h: add missing nvme_id_ctrl endianess annotations

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 9d99a8dd 02-Oct-2015 Christoph Hellwig <hch@lst.de>

nvme: move hardware structures out of the uapi version of nvme.h

Currently all NVMe command and completion structures are exposed to userspace
through the uapi version of nvme.h. They are not an ABI between the kernel
and userspace, and will change in C-incompatible way for future versions of
the spec. Move them to the kernel version of the file and rename the uapi
header to nvme_ioctl.h so that userspace can easily detect the presence of
the new clean header. Nvme-cli already carries a local copy of the header,
so it won't be affected by this move.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f11bb3e2 03-Oct-2015 Christoph Hellwig <hch@lst.de>

nvme: add a local nvme.h header

Add a new drivers/block/nvme.h which contains all the driver internal
interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 0a7385ad 02-Oct-2015 Keith Busch <kbusch@kernel.org>

NVMe: Simplify device resume on io queue failure

Releasing IO queues and disks was done in a work queue outside the
controller resume context to delete namespaces if the controller failed
after a resume from suspend. This is unnecessary since we can resume
a device asynchronously.

This patch makes resume use probe_work so it can directly remove
namespaces if the device is manageable but not IO capable. Since the
deleting disks was the only reason we had the convoluted "reset_workfn",
this patch removes that unnecessary indirection.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 188c3568 01-Oct-2015 Keith Busch <kbusch@kernel.org>

NVMe: Reference count open namespaces

Dynamic namespace attachment means the namespace may be removed at any
time, so the namespace reference count can not be tied to the device
reference count. This fixes a NULL dereference if an opened namespace
is detached from a controller.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 81f03fed 10-Aug-2015 Jon Derrick <jonathan.derrick@intel.com>

NVMe: Add nvme subsystem reset IOCTL

Controllers can perform optional subsystem resets as introduced in NVMe
1.1. This patch adds an IOCTL to trigger the subsystem reset by writing
"NVMe" to the NSSR register.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# dfbac8c7 10-Aug-2015 Keith Busch <kbusch@kernel.org>

NVMe: Add nvme subsystem reset support

Controllers part of an NVMe subsystem may be reset by any other controller
in the subsystem. If the device is capable of subsystem resets, this
patch adds detection for such events and performs appropriate controller
initialization upon subsystem reset detection.

The register bit is a RW1C type, so the driver needs to write a 1 to the
status bit to clear the subsystem reset occured bit during initialization.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8ffaadf7 20-Jul-2015 Jon Derrick <jonathan.derrick@intel.com>

NVMe: Use CMB for the IO SQes if available

Some controllers have a controller-side memory buffer available for use
for submissions, completions, lists, or data.

If a CMB is available, the entire CMB will be ioremapped and it will
attempt to map the IO SQes onto the CMB. The queues will be shrunk as
needed. The CMB will not be used if the queue depth is shrunk below some
threshold where it may have reduced performance over a larger queue
in system memory.

Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a5768aa8 01-Jun-2015 Keith Busch <kbusch@kernel.org>

NVMe: Automatic namespace rescan

Namespaces may be dynamically allocated and deleted or attached and
detached. This has the driver rescan the device for namespace changes
after each device reset or namespace change asynchronous event.

There could potentially be many detached namespaces that we don't want
polluting /dev/ with unusable block handles, so this will delete disks
if the namespace is not active as indicated by the response from identify
namespace. This also skips adding the disk if no capacity is provisioned
to the namespace in the first place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# d29ec824 22-May-2015 Christoph Hellwig <hch@lst.de>

nvme: submit internal commands through the block layer

Use block layer queues with an internal cmd_type to submit internally
generated NVMe commands. This both simplifies the code a lot and allow
for a better structure. For example now the LighNVM code can construct
commands without knowing the details of the underlying I/O descriptors.
Or a future NVMe over network target could inject commands, as well as
could the SCSI translation and ioctl code be reused for such a beast.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e75ec752 22-May-2015 Christoph Hellwig <hch@lst.de>

nvme: store a struct device pointer in struct nvme_dev

Most users want the generic device, so store that in struct nvme_dev
instead of the pci_dev. This also happens to be a nice step towards
making some code reusable for non-PCI transports.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f705f837 22-May-2015 Christoph Hellwig <hch@lst.de>

nvme: consolidate synchronous command submission helpers

Note that we keep the unused timeout argument, but allow callers to
pass 0 instead of a timeout if they want the default. This will allow
adding a timeout to the pass through path later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a67a9513 07-Apr-2015 Keith Busch <kbusch@kernel.org>

NVMe: Meta data handling through submit io ioctl

This adds support for the extended metadata formats through the submit
IO ioctl, and simplifies the rest when using a separate metadata format.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 07836e65 19-Feb-2015 Keith Busch <kbusch@kernel.org>

NVMe: Fix potential corruption during shutdown

The driver has to end unreturned commands at some point even if the
controller has not provided a completion. The driver tried to be safe by
deleting IO queues prior to ending all unreturned commands. That should
cause the controller to internally abort inflight commands, but IO queue
deletion request does not have to be successful, so all bets are off. We
still have to make progress, so to be extra safe, this patch doesn't
clear a queue to release the dma mapping for a command until after the
pci device has been disabled.

This patch removes the special handling during device initialization
so controller recovery can be done all the time. This is possible since
initialization is not inlined with pci probe anymore.

Reported-by: Nilish Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>


# 2e1d8448 12-Feb-2015 Keith Busch <kbusch@kernel.org>

NVMe: Asynchronous controller probe

This performs the longest parts of nvme device probe in scheduled work.
This speeds up probe significantly when multiple devices are in use.

Signed-off-by: Keith Busch <keith.busch@intel.com>


# b3fffdef 03-Feb-2015 Keith Busch <kbusch@kernel.org>

NVMe: Register management handle under nvme class

This creates a new class type for nvme devices to register their
management character devices with. This is so we do not rely on miscdev
to provide enough minors for as many nvme devices some people plan to
use. The previous limit was approximately 60 NVMe controllers, depending
on the platform and kernel. Now the limit is 1M, which ought to be enough
for anybody.

Since we have a new device class, it makes sense to attach the block
devices under this as well, so part of this patch moves the management
handle initialization prior to the namespaces discovery.

Signed-off-by: Keith Busch <keith.busch@intel.com>


# 4f1982b4 19-Feb-2015 Keith Busch <kbusch@kernel.org>

NVMe: Update SCSI Inquiry VPD 83h translation

The original translation created collisions on Inquiry VPD 83 for many
existing devices. Newer specifications provide other ways to translate
based on the device's version can be used to create unique identifiers.

Version 1.1 provides an EUI64 field that uniquely identifies each
namespace, and 1.2 added the longer NGUID field for the same reason.
Both follow the IEEE EUI format and readily translate to the SCSI device
identification EUI designator type 2h. For devices implementing either,
the translation will use this type, defaulting to the EUI64 8-byte type if
implemented then NGUID's 16 byte version if not. If neither are provided,
the 1.0 translation is used, and is updated to use the SCSI String format
to guarantee a unique identifier.

Knowing when to use the new fields depends on the nvme controller's
revision. The NVME_VS macro was not decoding this correctly, so that is
fixed in this patch and moved to a more appropriate place.

Since the Identify Namespace structure required an update for the NGUID
field, this patch adds the remaining new 1.2 fields to the structure.

Signed-off-by: Keith Busch <keith.busch@intel.com>


# e1e5e564 19-Feb-2015 Keith Busch <kbusch@kernel.org>

NVMe: Metadata format support

Adds support for NVMe metadata formats and exposes block devices for
all namespaces regardless of their format. Namespace formats that are
unusable will have disk capacity set to 0, but a handle to the block
device is created to simplify device management. A namespace is not
usable when the format requires host interleave block and metadata in
single buffer, has no provisioned storage, or has better data but failed
to register with blk integrity.

The namespace has to be scanned in two phases to support separate
metadata formats. The first establishes the sector size and capacity
prior to invoking add_disk. If metadata is required, the capacity will
be temporarilly set to 0 until it can be revalidated and registered with
the integrity extenstions after add_disk completes.

The driver relies on the integrity extensions to provide the metadata
buffer. NVMe requires this be a single physically contiguous region,
so only one integrity segment is allowed per command. If the metadata
is used for T10 PI, the driver provides mappings to save and restore
the reftag physical block translation. The driver provides no-op
functions for generate and verify if metadata is not used for protection
information. This way the setup is always provided by the block layer.

If a request does not supply a required metadata buffer, the command
is failed with bad address. This could only happen if a user manually
disables verify/generate on such a disk. The only exception to where
this is okay is if the controller is capable of stripping/generating
the metadata, which is possible on some types of formats.

The metadata scatter gather list now occupies the spot in the nvme_iod
that used to be used to link retryable IOD's, but we don't do that
anymore, so the field was unused.

Signed-off-by: Keith Busch <keith.busch@intel.com>


# ac3dd5bd 22-Jan-2015 Jens Axboe <axboe@fb.com>

NVMe: avoid kmalloc/kfree for smaller IO

Currently we allocate an nvme_iod for each IO, which holds the
sg list, prps, and other IO related info. Set a threshold of
2 pages and/or 8KB of data, below which we can just embed this
in the per-command pdu in blk-mq. For any IO at or below
NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree.

For higher IOPS, this saves up to 1% of CPU time.

Signed-off-by: Jens Axboe <axboe@fb.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>


# a4aea562 04-Nov-2014 Matias Bjørling <m@bjorling.me>

NVMe: Convert to blk-mq

This converts the NVMe driver to a blk-mq request-based driver.

The NVMe driver is currently bio-based and implements queue logic within
itself. By using blk-mq, a lot of these responsibilities can be moved
and simplified.

The patch is divided into the following blocks:

* Per-command data and cmdid have been moved into the struct request
field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id
maintenance are now handled by blk-mq through the rq->tag field.

* The logic for splitting bio's has been moved into the blk-mq layer.
The driver instead notifies the block layer about limited gap support in
SG lists.

* blk-mq handles timeouts and is reimplemented within nvme_timeout().
This both includes abort handling and command cancelation.

* Assignment of nvme queues to CPUs are replaced with the blk-mq
version. The current blk-mq strategy is to assign the number of
mapped queues and CPUs to provide synergy, while the nvme driver
assign as many nvme hw queues as possible. This can be implemented in
blk-mq if needed.

* NVMe queues are merged with the tags structure of blk-mq.

* blk-mq takes care of setup/teardown of nvme queues and guards invalid
accesses. Therefore, RCU-usage for nvme queues can be removed.

* IO tracing and accounting are handled by blk-mq and therefore removed.

* Queue suspension logic is replaced with the logic from the block
layer.

Contributions in this patch from:

Sam Bradshaw <sbradshaw@micron.com>
Jens Axboe <axboe@fb.com>
Keith Busch <keith.busch@intel.com>
Robert Nelson <rlnelson@google.com>

Acked-by: Keith Busch <keith.busch@intel.com>
Acked-by: Jens Axboe <axboe@fb.com>

Updated for new ->queue_rq() prototype.

Signed-off-by: Jens Axboe <axboe@fb.com>


# 1d090624 23-Jun-2014 Keith Busch <kbusch@kernel.org>

NVMe: Mismatched host/device page size support

Adds support for devices with max page size smaller than the host's.
In the case we encounter such a host/device combination, the driver will
split a page into as many PRP entries as necessary for the device's page
size capabilities. If the device's reported minimum page size is greater
than the host's, the driver will not attempt to enable the device and
return an error instead.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 6fccf938 18-Jun-2014 Keith Busch <kbusch@kernel.org>

NVMe: Async event request

Submits NVMe asynchronous event requests, one event up to the controller
maximum or number of possible different event types (8), whichever is
smaller. Events successfully returned by the controller are logged.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f3db22fe 11-Jun-2014 Keith Busch <kbusch@kernel.org>

NVMe: Fix hot cpu notification dead lock

There is a potential dead lock if a cpu event occurs during nvme probe
since it registered with hot cpu notification. This fixes the race by
having the module register with notification outside of probe rather
than have each device register.

The actual work is done in a scheduled work queue instead of in the
notifier since assigning IO queues has the potential to block if the
driver creates additional queues.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# bd67608a 03-Jun-2014 Matthew Wilcox <willy@infradead.org>

NVMe: Rename io_timeout to nvme_io_timeout

It's positively immoral to have a global variable called 'io_timeout'.
Keep the module parameter called io_timeout, though.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 53562be7 29-Apr-2014 Keith Busch <kbusch@kernel.org>

NVMe: Flush with data support

It is possible a filesystem may send a flush flagged bio with write
data. There is no such composite NVMe command, so the driver sends flush
and write separately.

The device is allowed to execute these commands in any order, so it was
possible the driver ends the bio after the write completes, but while the
flush is still active. We don't want to let a filesystem believe flush
succeeded before it really has; this could cause data corruption on a
power loss between these events. To fix, this patch splits the flush
and write into chained bios.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# a7d2ce28 29-Apr-2014 Keith Busch <kbusch@kernel.org>

NVMe: Configure support for block flush

This configures an nvme request_queue as flush capable if the device
has a volatile write cache present.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 8757ad65 11-Apr-2014 Matthew Wilcox <willy@infradead.org>

NVMe: Update copyright headers

Make the copyright dates accurate and remove the final paragraph that
includes the address of the FSF.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# edd10d33 03-Apr-2014 Keith Busch <kbusch@kernel.org>

NVMe: Retry failed commands with non-fatal errors

For commands returned with failed status, queue these for resubmission
and continue retrying them until success or for a limited amount of
time. The final timeout was arbitrarily chosen so requests can't be
retried indefinitely.

Since these are requeued on the nvmeq that submitted the command, the
callbacks have to take an nvmeq instead of an nvme_dev as a parameter
so that we can use the locked queue to append the iod to retry later.

The nvme_iod conviently can be used to track how long we've been trying
to successfully complete an iod request. The nvme_iod also provides the
nvme prp dma mappings, so I had to move a few things around so we can
keep those mappings.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[fixed checkpatch issue with long line]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# b355084a 04-Apr-2014 Keith Busch <kbusch@kernel.org>

NVMe: Make I/O timeout a module parameter

Increase the default timeout to 30 seconds to match SCSI.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[use byte instead of ushort]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 33b1e95c 24-Mar-2014 Keith Busch <kbusch@kernel.org>

NVMe: CPU hot plug notification

Registers with hot cpu notification to rebalance, and potentially allocate
additional, io queues.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 42f61420 24-Mar-2014 Keith Busch <kbusch@kernel.org>

NVMe: per-cpu io queues

The device's IO queues are associated with CPUs, so we can use a per-cpu
variable to map the a qid to a cpu. This provides a convienient way
to optimally assign queues to multiple cpus when the device supports
fewer queues than the host has cpus. The previous implementation may
have assigned these poorly in these situations. This patch addresses
this by sharing queues among cpus that are "close" together and should
have a lower lock contention penalty.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 4f5099af 03-Mar-2014 Keith Busch <kbusch@kernel.org>

NVMe: IOCTL path RCU protect queue access

This adds rcu protected access to a queue in the nvme IOCTL path
to fix potential races between a surprise removal and queue usage in
nvme_submit_sync_cmd. The fix holds the rcu_read_lock() here to prevent
the nvme_queue from freeing while this path is executing so it can't
sleep, and so this path will no longer wait for a available command
id should they all be in use at the time a passthrough IOCTL request
is received.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 5a92e700 21-Feb-2014 Keith Busch <kbusch@kernel.org>

NVMe: RCU protected access to io queues

This adds rcu protected access to nvme_queue to fix a race between a
surprise removal freeing the queue and a thread with open reference on
a NVMe block device using that queue.

The queues do not need to be rcu protected during the initialization or
shutdown parts, so I've added a helper function for raw deferencing
to get around the sparse errors.

There is still a hole in the IOCTL path for the same problem, which is
fixed in a subsequent patch.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 9ca97374 07-Mar-2014 Tejun Heo <tj@kernel.org>

nvme: don't use PREPARE_WORK

PREPARE_[DELAYED_]WORK() are being phased out. They have few users
and a nasty surprise in terms of reentrancy guarantee as workqueue
considers work items to be different if they don't have the same work
function.

nvme_dev->reset_work is multiplexed with multiple work functions.
Introduce nvme_reset_workfn() which invokes nvme_dev->reset_workfn and
always use it as the work function and update the users to set the
->reset_workfn field instead of overriding the work function using
PREPARE_WORK().

It would probably be best to route this with other related updates
through the workqueue tree.

Compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: linux-nvme@lists.infradead.org


# c30341dc 10-Dec-2013 Keith Busch <kbusch@kernel.org>

NVMe: Abort timed out commands

Send nvme abort command to io requests that have timed out on an
initialized device. If the command is not returned after another timeout,
schedule the controller for reset.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[fix endianness issues]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# d4b4ff8e 10-Dec-2013 Keith Busch <kbusch@kernel.org>

NVMe: Schedule reset for failed controllers

Schedules a controller reset when it indicates it has a failed status. If
the device does not become ready after a reset, the pci device will be
scheduled for removal.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[fixed checkpatch issue]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 9a6b9458 10-Dec-2013 Keith Busch <kbusch@kernel.org>

NVMe: Device resume error handling

Adds controller error handling on resume power management. If the device
fails to initialize, the device is queued for a reset. If the reset fails,
a thread is spawned to remove the pci device.

If the device resumes as "busy", the device is responding to admin
commands but will not create IO queues. In this case, we need to remove
the gendisks and free the IO queues since they can't be used and may be
holding bios in their lists.

From testing, the dma pools require a pci device so this had to change
the pci driver 'remove' to release the dma resources in line with that
call instead of after all references to the device are released.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 320a3827 23-Oct-2013 Keith Busch <kbusch@kernel.org>

NVMe: compat SG_IO ioctl

For 32-bit versions of sg3-utils running on a 64-bit system. This is
mostly a copy from the relevent portions of fs/compat_ioctl.c, with
slight modifications for going through block_device_operations.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>
[fixed up CONFIG_COMPAT=n build problems]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# b80d5ccc 09-Sep-2013 Haiyan Hu <huhaiyan@huawei.com>

NVMe: Avoid shift operation when writing cq head doorbell

Changes the type of dev->db_stride to unsigned and changes the value
stored there to be 1 << the current value. Then there is less
calculation to be done at completion time.

Signed-off-by: Haiyan Hu <huhaiyan@huawei.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 1894d8f1 15-Jul-2013 Keith Busch <kbusch@kernel.org>

NVMe: Use normal shutdown

The NVMe spec recommends using the shutdown normal sequence when safely
taking the controller offline instead of hitting CC.EN on the next
start-up to reset the controller. The spec recommends a minimum of 1
second for the shutdown complete. This patch waits 2 seconds to be on
the safe side.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# c3bfe717 08-Jul-2013 Matthew Wilcox <willy@infradead.org>

NVMe: Namespace IDs are unsigned

The 'Number of Namespaces' read from the device was being treated as
signed, which would cause us to not scan any namespaces for a device
with more than 2 billion namespaces. That led to noticing that the
namespace ID was also being treated as signed, which could lead to the
result from NVME_IOCTL_ID being treated as an error code.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 42c77683 25-Jun-2013 Matthew Wilcox <willy@infradead.org>

NVMe: Split header file into user-visible and kernel-visible pieces

To build user programs that call the NVMe ioctls, we need to have a
user header file. Catch up to the new way of doing that by splitting
the header file into kernel and uapi portions.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 6198221f 29-May-2013 Keith Busch <kbusch@kernel.org>

NVMe: Disk IO statistics

Add io stats accounting for bio requests so nvme block devices show
useful disk stats.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# ab3ea5bf 06-May-2013 Matthew Wilcox <willy@infradead.org>

NVMe: Simplify Firmware Activate code slightly

Add definitions for the three Firmware Activate actions, and change the
SCSI translation code to construct the command into a temporary variable
instead of translating the endianness back-and-forth.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com>


# f410c680 23-Apr-2013 Keith Busch <kbusch@kernel.org>

NVMe: Meta-data support in NVME_IOCTL_SUBMIT_IO

This adds support for namespaces with separate meta-data formats in the
submit io ioctl. The meta-data buffer has to be a contiguous, so such
a buffer is allocated and the mapped user pages are copied to/from this
buffer for write/read commands.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 159b67d7 09-Apr-2013 Keith Busch <kbusch@kernel.org>

NVMe: Device specific stripe size handling

We have an nvme device that has a concept of a stripe size. IO requests
that do not transfer data crossing a stripe boundary has greater
performance compared to IO that does cross it. This patch sets the
stripe size for the device if the device and vendor ids match one with
this feature and splits IO requests that cross the stripe boundary.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 5e82e952 19-Feb-2013 Keith Busch <kbusch@kernel.org>

NVMe: Add a character device for each nvme device

Registers a miscellaneous device for each nvme controller probed. This
creates character device files as /dev/nvmeN, where N is the device
instance, and supports nvme admin ioctl commands so devices without
namespaces can be managed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 1c9b5265 16-Apr-2013 Matthew Wilcox <willy@infradead.org>

NVMe: Fix endian-related problems in user I/O submission path

When constructing the command, dsmgmt needs to be treated as a 32-bit
value, not a 16-bit value. reftag, apptag and appmask all need to be
converted from native-endian to little-endian. Again, sparse's bitwise
warnings caught this problem. Thanks to Keith for pointing out the
correct way to fix the reftag.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>


# 063cc6d5 27-Mar-2013 Matthew Wilcox <willy@infradead.org>

NVMe: Abstract out sector to block number conversion

Introduce nvme_block_nr() to help convert sectors to block numbers.
This fixes an integer overflow in the SCSI conversion layer, and it's
slightly less typing than opencoding it.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>


# 5d0f6131 04-Mar-2013 Vishal Verma <vishal.l.verma@intel.com>

NVMe: Add nvme-scsi.c

Translates SCSI commands in SG_IO ioctl to NVMe commands.
Uses the scsi-nvme translation spec from nvmexpress.org as reference.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# f8ebf840 27-Mar-2013 Vishal Verma <vishal.l.verma@intel.com>

NVMe: Add definitions for format command

The SCSI emulation has the ability to send format commands, so we need
to add the definition of the command. Also add a missing error code.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 13c3b0fc 04-Mar-2013 Vishal Verma <vishal.l.verma@intel.com>

NVMe: Move structures & definitions to header file

nvme-scsi.c uses several data structures and definitions that were
previously private to nvme-core.c. Move the definitions to nvme.h,
protected by __KERNEL__.

Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 0e5e4f0e 09-Nov-2012 Keith Busch <kbusch@kernel.org>

NVMe: Add discard support for capable devices

This adds discard support to block queues if the nvme device is capable of
deallocating blocks as indicated by the controller's optional command support.
A discard flagged bio request will submit an NVMe deallocate Data Set
Management command for the requested blocks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 6ecec745 26-Sep-2012 Keith Busch <kbusch@kernel.org>

NVMe: Define SMART log

This data structure is defined in the NVMe specification. It's not used
by the kernel, but is available for use by userspace software.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# a0cadb85 27-Jul-2012 Keith Busch <kbusch@kernel.org>

NVMe: Do not set IO queue depth beyond device max

Set the depth for IO queues to the device's maximum supported queue
entries if the requested depth exceeds the device's capabilities.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 8fc23e03 26-Jul-2012 Keith Busch <kbusch@kernel.org>

NVMe: Set block queue max sectors

Set the max hw sectors in a namespace's request queue if the nvme device
has a max data transfer size.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 010e646b 04-Nov-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Update Identify Controller data structure

The driver was still using an old definition of Identify Controller
which only came to light once we started using the 'number of namespaces'
field properly.

Reported-by: Nisheeth Bhat <nisheeth.bhat@intel.com>
Reported-by: Khosrow Panah <Khosrow.Panah@idt.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# f1938f6e 20-Oct-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Implement doorbell stride capability

The doorbell stride allows devices to spread out their doorbells instead
of packing them tightly. This feature was added as part of ECN 003.

This patch also enables support for more than 512 queues :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 6bbf1acd 20-May-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Rework ioctls

Remove the special-purpose IDENTIFY, GET_RANGE_TYPE, DOWNLOAD_FIRMWARE
and ACTIVATE_FIRMWARE commands. Replace them with a generic ADMIN_CMD
ioctl that can submit any admin command.

Add a new ID ioctl that returns the namespace ID of the queried device.
It corresponds to the SCSI Idlun ioctl.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 22605f96 19-Apr-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Time out initialisation after a few seconds

THe device reports (in its capability register) how long it will take
to initialise. If that time elapses before the ready bit becomes set,
conclude the device is broken and refuse to initialise it. Log a nice
error message so the user knows why we did nothing.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 7f53f9d2 22-Mar-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Correct the Controller Configuration settings

The arbitration field was extended by one bit, shifting the shutdown
notification bits by one. Also, the SQ/CQ entry size was made
configurable for future extensions.

Reported-by: Paul Luse <paul.e.luse@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 6c7d4945 21-Mar-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Change the definition of nvme_user_io

The read and write commands don't define a 'result', so there's no need
to copy it back to userspace.

Remove the ability of the ioctl to submit commands to a different
namespace; it's just asking for trouble, and the use case I have in mind
will be addressed througha different ioctl in the future. That removes
the need for both the block_shift and nsid arguments.

Check that the opcode is one of 'read' or 'write'. Future opcodes may
be added in the future, but we will need a different structure definition
for them.

The nblocks field is redefined to be 0-based. This allows the user to
request the full 65536 blocks.

Don't byteswap the reftag, apptag and appmask. Martin Petersen tells
me these are calculated in big-endian and are transmitted to the device
in big-endian.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 9d4af1b7 20-Mar-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Correct the definitions of two ioctls

NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command
as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read.

Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 19e899b2 16-Mar-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Remove outdated comments

The head can never overrun the tail since we won't allocate enough command
IDs to let that happen. The status codes are in sync with the spec.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 2ddc4f74 28-Feb-2011 Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com>

NVMe: Update admin opcodes to match the 1.0RC spec

Signed-off-by: Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 897cfe1c 13-Feb-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Update BAR structure to match the current spec

Add two reserved registers in the middle of the BAR to match the 1.0
spec plus ECN 0002.

Also rename IMC and ISC to INTMC and INTSC to conform with the spec.
We still don't need to use them :-)

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 6ee44cdc 03-Feb-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Add download / activate firmware ioctls

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 7a63e07b 03-Feb-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Add remaining status codes

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# a53295b6 01-Feb-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Add NVME_IOCTL_SUBMIT_IO

Allow userspace to submit synchronous I/O like the SCSI sg interface does.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# 7b4fe9b1 26-Jan-2011 Matthew Wilcox <willy@infradead.org>

NVMe: Make nvme_common_command more featureful

Add prp1, prp2 and the metadata prp to the common command, since the
fields are generally used this way.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>


# b60503ba 19-Jan-2011 Matthew Wilcox <willy@infradead.org>

NVMe: New driver

This driver is for devices that follow the NVM Express standard

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>