History log of /linux-master/drivers/nvme/host/fabrics.h
Revision Date Author Comments
# de105068 10-Mar-2024 Chunguang Xu <chunguang.xu@shopee.com>

nvme: fix reconnection fail due to reserved tag allocation

We found a issue on production environment while using NVMe over RDMA,
admin_q reconnect failed forever while remote target and network is ok.
After dig into it, we found it may caused by a ABBA deadlock due to tag
allocation. In my case, the tag was hold by a keep alive request
waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
request maked as idle and will not process before reset success. As
fabric_q shares tagset with admin_q, while reconnect remote target, we
need a tag for connect command, but the only one reserved tag was held
by keep alive command which waiting inside admin_q. As a result, we
failed to reconnect admin_q forever. In order to fix this issue, I
think we should keep two reserved tags for admin queue.

Fixes: ed01fee283a0 ("nvme-fabrics: only reserve a single tag")
Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 6d3c7fb1 24-Jan-2024 Keith Busch <kbusch@kernel.org>

nvme: use ctrl state accessor

The ctrl->state value is updated in another thread using WRITE_ONCE, so
ensure all the readers use the appropriate accessor.

Reviewed-by: Sagi Grimberg <sagi@grmberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# adf22c52 24-Aug-2023 Hannes Reinecke <hare@suse.de>

nvme-fabrics: parse options 'keyring' and 'tls_key'

Parse the fabrics options 'keyring' and 'tls_key' and store the
referenced keys in the options structure.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# be8e82ca 24-Aug-2023 Hannes Reinecke <hare@suse.de>

nvme-tcp: enable TLS handshake upcall

Add a fabrics option 'tls' and start the TLS handshake upcall
with the default PSK. When TLS is started the PSK key serial
number is displayed in the sysfs attribute 'tls_key'

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 5e4b55fa 12-May-2023 Max Gurtovoy <mgurtovoy@nvidia.com>

nvme-fabrics: check hostid using uuid_equal

Use a dedicated function to match uuids instead of duplicating it.

Tested-by: Noam Gottlieb <ngottlieb@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# e64b0c80 01-May-2023 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

nvmet: reorder fields in 'struct nvmf_ctrl_options'

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmf_ctrl_options' from 136 to
128 bytes.

When such a structure is allocated in nvmf_create_ctrl(), because of the
way memory allocation works, when 136 bytes were requested, 192 bytes were
allocated.

So this saves 64 bytes per allocation, 1 cache line to hold the whole
structure and a few cycles when zeroing the memory in nvmf_create_ctrl().

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# a249d306 26-Apr-2023 Keith Busch <kbusch@kernel.org>

nvme-fabrics: add queue setup helpers

tcp and rdma transports have lots of duplicate code setting up the
different queue mappings. Add common helpers.

Cc: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 26a57cb3 21-Feb-2023 Daniel Wagner <dwagner@suse.de>

nvme-fabrics: show well known discovery name

The kernel always logs the unique subsystem name for a discovery
controller, even in the case user space asked for the well known.

This has lead to confusion as the logs of nvme-cli and the kernel
logs didn't match.

First, nvme-cli connects to the well known discovery controller to
figure out if it supports TP8013. If so then nvme-cli disconnects and
connects to the unique discovery controller. Currently, the kernel show
that user space connected twice to the unique one.

To avoid further confusion, show the well known discovery controller if
user space asked for it:

$ nvme connect-all -v -t tcp -a 192.168.0.1
nvme0: nqn.2014-08.org.nvmexpress.discovery connected
nvme0: nqn.2014-08.org.nvmexpress.discovery disconnected
nvme0: nqn.discovery connected

kernel log:
nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.1:8009
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme0: new ctrl: NQN "nqn.discovery", addr 192.168.0.1:8009

Fixes: e5ea42faa773 ("nvme: display correct subsystem NQN")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f50fff73 27-Jun-2022 Hannes Reinecke <hare@suse.de>

nvme: implement In-Band authentication

Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006.
This patch adds two new fabric options 'dhchap_secret' to specify the
pre-shared key (in ASCII respresentation according to NVMe 2.0 section
8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify
the pre-shared controller key for bi-directional authentication of both
the host and the controller.
Re-authentication can be triggered by writing the PSK into the new
controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[axboe: fold in clang build fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 93ba75c9 30-Mar-2022 Chaitanya Kulkarni <kch@nvidia.com>

nvme-fabrics: add a request timeout helper

The RDAMA and TCP transport both complete the timed out request in the
same manner and hence code is duplicated. Add and use the helper
nvmf_complete_timed_out_request() to remove the duplicate code.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6a51abde 20-Jan-2022 Uday Shankar <ushankar@purestorage.com>

nvme-fabrics: fix state check in nvmf_ctlr_matches_baseopts()

Controller deletion/reset, immediately followed by or concurrent with
a reconnect, is hard failing the connect attempt resulting in a
complete loss of connectivity to the controller.

In the connect request, fabrics looks for an existing controller with
the same address components and aborts the connect if a controller
already exists and the duplicate connect option isn't set. The match
routine filters out controllers that are dead or dying, so they don't
interfere with the new connect request.

When NVME_CTRL_DELETING_NOIO was added, it missed updating the state
filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this
new state, it's seen as a live controller and fails the connect request.

Correct by adding the DELETING_NIO state to the match checks.

Fixes: ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")
Cc: <stable@vger.kernel.org> # v5.7+
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e5ea42fa 22-Sep-2021 Hannes Reinecke <hare@suse.de>

nvme: display correct subsystem NQN

With discovery controllers supporting unique subsystem NQNs the
actual subsystem NQN might be different from that one passed in
via the connect args. So add a helper to display the resulting
subsystem NQN.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 20e8b689 22-Sep-2021 Hannes Reinecke <hare@suse.de>

nvme: Add connect option 'discovery'

Add a connect option 'discovery' to specify that the connection
should be made to a discovery controller, not a normal I/O controller.
With discovery controllers supporting unique subsystem NQNs we
cannot easily distinguish by the subsystem NQN if this should be
a discovery connection, but we need this information to blank out
options not supported by discovery controllers.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# be42a33b 10-Jun-2021 Keith Busch <kbusch@kernel.org>

nvme: use blk_execute_rq() for passthrough commands

The generic blk_execute_rq() knows how to handle polled completions. Use
that instead of implementing an nvme specific handler.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# daf6e8c9 09-Jun-2021 Shai Malin <smalin@marvell.com>

Revert "nvme-tcp-offload: ULP Series"

This reverts commits:
- 762411542050dbe27c7c96f13c57f93da5d9b89a
nvme: NVME_TCP_OFFLOAD should not default to m
- 5ff5622ea1f16d535f1be4e478e712ef48fe183b:
Merge branch 'NVMeTCP-Offload-ULP'

As requested on the mailing-list: https://lore.kernel.org/netdev/SJ0PR18MB3882C20793EA35A3E8DAE300CC379@SJ0PR18MB3882.namprd18.prod.outlook.com/
This patch will revert the nvme-tcp-offload ULP from net-next.

The nvme-tcp-offload ULP series will continue to be considered only on
linux-nvme@lists.infradead.org.

Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# af527935 02-Jun-2021 Prabhakar Kushwaha <pkushwaha@marvell.com>

nvme-fabrics: Expose nvmf_check_required_opts() globally

nvmf_check_required_opts() is used to check if user provided opts has
the required_opts or not. if not, it will log which options are not
provided.

It can be leveraged by nvme-tcp-offload to check if provided opts are
supported by this specific vendor driver or not.

So expose nvmf_check_required_opts() globally.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 98a5097d 02-Jun-2021 Prabhakar Kushwaha <pkushwaha@marvell.com>

nvme-fabrics: Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions

Move NVMF_ALLOWED_OPTS and NVMF_REQUIRED_OPTS definitions
to header file, so it can be used by the different HW devices.

NVMeTCP offload devices might have different limitations of the
allowed options, for example, a device that does not support all the
queue types. With tcp and rdma, only the nvme-tcp and nvme-rdma layers
handle those attributes and the HW devices do not create any limitations
for the allowed options.

An alternative design could be to add separate fields in
nvme_tcp_ofld_ops such as max_hw_sectors and max_segments that
we already have in this series.

Acked-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: Arie Gershberg <agershberg@marvell.com>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@marvell.com>
Signed-off-by: Omkar Kulkarni <okulkarni@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Shai Malin <smalin@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: David S. Miller <davem@davemloft.net>


# 3ede8f72 20-May-2021 Martin Belanger <martin.belanger@dell.com>

nvme-tcp: allow selecting the network interface for connections

In our application, we need a way to force TCP connections to go out a
specific IP interface instead of letting Linux select the interface
based on the routing tables.

Add the 'host-iface' option to allow specifying the interface to use.
When the option host-iface is specified, the driver uses the specified
interface to set the option SO_BINDTODEVICE on the TCP socket before
connecting.

This new option is needed in addtion to the existing host-traddr for
the following reasons:

Specifying an IP interface by its associated IP address is less
intuitive than specifying the actual interface name and, in some cases,
simply doesn't work. That's because the association between interfaces
and IP addresses is not predictable. IP addresses can be changed or can
change by themselves over time (e.g. DHCP). Interface names are
predictable [1] and will persist over time. Consider the following
configuration.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ...
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 100.0.0.100/24 scope global lo
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
inet 100.0.0.100/24 scope global enp0s3
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
inet 100.0.0.100/24 scope global enp0s8
valid_lft forever preferred_lft forever

The above is a VM that I configured with the same IP address
(100.0.0.100) on all interfaces. Doing a reverse lookup to identify the
unique interface associated with 100.0.0.100 does not work here. And
this is why the option host_iface is required. I understand that the
above config does not represent a standard host system, but I'm using
this to prove a point: "We can never know how users will configure
their systems". By te way, The above configuration is perfectly fine
by Linux.

The current TCP implementation for host_traddr performs a
bind()-before-connect(). This is a common construct to set the source
IP address on a TCP socket before connecting. This has no effect on how
Linux selects the interface for the connection. That's because Linux
uses the Weak End System model as described in RFC1122 [2]. On the other
hand, setting the Source IP Address has benefits and should be supported
by linux-nvme. In fact, setting the Source IP Address is a mandatory
FedGov requirement (e.g. connection to a RADIUS/TACACS+ server).
Consider the following configuration.

$ ip addr list dev enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ...
link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8
valid_lft 426sec preferred_lft 426sec
inet 192.168.56.102/24 scope global secondary enp0s8
valid_lft forever preferred_lft forever
inet 192.168.56.103/24 scope global secondary enp0s8
valid_lft forever preferred_lft forever
inet 192.168.56.104/24 scope global secondary enp0s8
valid_lft forever preferred_lft forever

Here we can see that several addresses are associated with interface
enp0s8. By default, Linux always selects the default IP address,
192.168.56.101, as the source address when connecting over interface
enp0s8. Some users, however, want the ability to specify a different
source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option
host_traddr can be used as-is to perform this function.

In conclusion, I believe that we need 2 options for TCP connections.
One that can be used to specify an interface (host-iface). And one that
can be used to set the source address (host-traddr). Users should be
allowed to use one or the other, or both, or none. Of course, the
documentation for host_traddr will need some clarification. It should
state that when used for TCP connection, this option only sets the
source address. And the documentation for host_iface should say that
this option is only available for TCP connections.

References:
[1] https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
[2] https://tools.ietf.org/html/rfc1122

Tested both IPv4 and IPv6 connections.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# a9715744 25-Apr-2021 Tao Chiu <taochiu@synology.com>

nvme: move the fabrics queue ready check routines to core

queue_rq() in pci only checks if the dispatched queue (nvmeq) is ready,
e.g. not being suspended. Since nvme_alloc_admin_tags() in reset flow
restarts the admin queue, users are able to submit admin commands to a
controller before reset_work() completes. Commands submitted under this
condition may interfere with commands that performs identify, IO queue
setup in reset_work(), and may result in a hang described in the
following patch.

As seen in the fabrics, user commands are prevented from being executed
under inproper controller states. We may reuse this logic to maintain a
clear admin queue during reset_work().

Signed-off-by: Tao Chiu <taochiu@synology.com>
Signed-off-by: Cody Wong <codywong@synology.com>
Reviewed-by: Leon Chien <leonchien@synology.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ed01fee2 03-Mar-2021 Christoph Hellwig <hch@lst.de>

nvme-fabrics: only reserve a single tag

Fabrics drivers currently reserve two tags on the admin queue. But
given that the connect command is only run on a freshly created queue
or after all commands have been force aborted we only need to reserve
a single tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>


# 8c4dfea9 24-Nov-2020 Victor Gladkov <Victor.Gladkov@kioxia.com>

nvme-fabrics: reject I/O to offline device

Commands get stuck while Host NVMe-oF controller is in reconnect state.
The controller enters into reconnect state when it loses connection with
the target. It tries to reconnect every 10 seconds (default) until
a successful reconnect or until the reconnect time-out is reached.
The default reconnect time out is 10 minutes.

Applications are expecting commands to complete with success or error
within a certain timeout (30 seconds by default). The NVMe host is
enforcing that timeout while it is connected, but during reconnect the
timeout is not enforced and commands may get stuck for a long period or
even forever.

To fix this long delay due to the default timeout, introduce new
"fast_io_fail_tmo" session parameter. The timeout is measured in seconds
from the controller reconnect and any command beyond that timeout is
rejected. The new parameter value may be passed during 'connect'.
The default value of -1 means no timeout (similar to current behavior).

Signed-off-by: Victor Gladkov <victor.gladkov@kioxia.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ecca390e 22-Jul-2020 Sagi Grimberg <sagi@grimberg.me>

nvme: fix deadlock in disconnect during scan_work and/or ana_work

A deadlock happens in the following scenario with multipath:
1) scan_work(nvme0) detects a new nsid while nvme0
is an optimized path to it, path nvme1 happens to be
inaccessible.

2) Before scan_work is complete nvme0 disconnect is initiated
nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING

3) scan_work(1) attempts to submit IO,
but nvme_path_is_optimized() observes nvme0 is not LIVE.
Since nvme1 is a possible path IO is requeued and scan_work hangs.

--
Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: efi_partition+0x1e6/0x708
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
--

4) Delete also hangs in flush_work(ctrl->scan_work)
from nvme_remove_namespaces().

Similiarly a deadlock with ana_work may happen: if ana_work has started
and calls nvme_mpath_set_live and device_add_disk, it will
trigger I/O. When we trigger disconnect I/O will block because
our accessible (optimized) path is disconnecting, but the alternate
path is inaccessible, so I/O blocks. Then disconnect tries to flush
the ana_work and hangs.

[ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
[ 605.552087] Call Trace:
[ 605.552683] __schedule+0x2b9/0x6c0
[ 605.553507] schedule+0x42/0xb0
[ 605.554201] io_schedule+0x16/0x40
[ 605.555012] do_read_cache_page+0x438/0x830
[ 605.556925] read_cache_page+0x12/0x20
[ 605.557757] read_dev_sector+0x27/0xc0
[ 605.558587] amiga_partition+0x4d/0x4c5
[ 605.561278] check_partition+0x154/0x244
[ 605.562138] rescan_partitions+0xae/0x280
[ 605.563076] __blkdev_get+0x40f/0x560
[ 605.563830] blkdev_get+0x3d/0x140
[ 605.564500] __device_add_disk+0x388/0x480
[ 605.565316] device_add_disk+0x13/0x20
[ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core]
[ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
[ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core]
[ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core]
[ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core]
[ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core]
[ 605.573330] process_one_work+0x1db/0x380
[ 605.574144] worker_thread+0x4d/0x400
[ 605.574896] kthread+0x104/0x140
[ 605.577205] ret_from_fork+0x35/0x40
[ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
[ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830
[ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.582320] nvme D 0 14044 14043 0x00000000
[ 605.583424] Call Trace:
[ 605.583935] __schedule+0x2b9/0x6c0
[ 605.584625] schedule+0x42/0xb0
[ 605.585290] schedule_timeout+0x203/0x2f0
[ 605.588493] wait_for_completion+0xb1/0x120
[ 605.590066] __flush_work+0x123/0x1d0
[ 605.591758] __cancel_work_timer+0x10e/0x190
[ 605.593542] cancel_work_sync+0x10/0x20
[ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core]
[ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core]
[ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
[ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core]
[ 605.598320] dev_attr_store+0x17/0x30

Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
indicate the phase of controller deletion where I/O cannot be allowed
to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
be issued to the bottom device, and only after we flush the ana_work
and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
from re-firing by aborting early if we are not LIVE, so we should be safe
here.

In addition, change the transport drivers to follow the updated state
machine.

Fixes: 0d0b660f214d ("nvme: add ANA support")
Reported-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5d02a5c1 03-Sep-2019 Keith Busch <kbusch@kernel.org>

nvme: Remove ADMIN_ONLY state

The admin only state was intended to fence off actions that don't
apply to a non-IO capable controller. The only actual user of this is
the scan_work, and pci was the only transport to ever set this state.
The consequence of having this state is placing an additional burden on
every other action that applies to both live and admin only controllers.

Remove the admin only state and place the admin only burden on the only
place that actually cares: scan_work.

This also prepares to make it easier to temporarily pause a LIVE state
so that we don't need to remember which state the controller had been in
prior to the pause.

Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 52b4451a 17-Aug-2019 Israel Rukshin <israelr@mellanox.com>

nvme-fabrics: Add type of service (TOS) configuration

TOS is user-defined and needs to be configured via nvme-cli.
It must be set before initiating any traffic and once set the TOS
cannot be changed.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 9002c4e5 18-Feb-2019 Christoph Hellwig <hch@lst.de>

nvme-fabrics: convert to SPDX identifiers

Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


# 89d43802 14-Dec-2018 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: allow user to pass in nr_poll_queues

This argument will specify how many polling I/O queues to connect when
creating the controller. These I/O queues will host I/O that is set with
REQ_HIPRI.

Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 26c68227 14-Dec-2018 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: allow nvmf_connect_io_queue to poll

Preparation for polling support for fabrics. Polling support
means that our completion queues are not generating any interrupts
which means we need to poll for the nvmf io queue connect as well.

Reviewed by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 330f6b8a 12-Dec-2018 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: allow user to set nr_write_queues for separate queue maps

This argument will specify how many I/O queues will be connected in
create_ctrl in addition to nr_io_queues. With this configuration, I/O
that carries payload from the host to the target, will use the default
hctx queue map, and I/O that involves target to host transfers will use
the read hctx queue map.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# fa9a1811 12-Dec-2018 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: add missing nvmf_ctrl_options documentation

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 20d44e86 03-Dec-2018 Sagi Grimberg <sagi@lightbitslabs.com>

nvme-fabrics: allow user passing data digest

Data digest is a nvme-tcp specific feature, but nothing prevents other
transports reusing the concept so do not associate with tcp transport
solely.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3b49fa80 03-Dec-2018 Sagi Grimberg <sagi@lightbitslabs.com>

nvme-fabrics: allow user passing header digest

Header digest is a nvme-tcp specific feature, but nothing prevents other
transports reusing the concept so do not associate with tcp transport
solely.

Signed-off-by: Sagi Grimberg <sagi@lightbitslabs.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8154ed73 19-Nov-2018 Sagi Grimberg <sagi@grimberg.me>

nvme: disable fabrics SQ flow control when asked by the user

As for now, we don't care about sq_head pointer updates anyway, so
at least allow the controller to micro-optimize by omiting this update.

Note that we will probably need to support it when a controller
that requires this comes along.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b7c7be6f6 18-Oct-2018 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: move controller options matching to fabrics

IP transports will most likely use the same controller options
matching when detecting a duplicate connect. Move it to
fabrics.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6cdefc6e 20-Jul-2018 James Smart <jsmart2021@gmail.com>

nvme: if_ready checks to fail io to deleting controller

The revised if_ready checks skipped over the case of returning error when
the controller is being deleted. Instead it was returning BUSY, which
caused the ios to retry, which caused the ns delete to hang waiting for
the ios to drain.

Stack trace of hang looks like:
kworker/u64:2 D 0 74 2 0x80000000
Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
Call Trace:
? __schedule+0x26d/0x820
schedule+0x32/0x80
blk_mq_freeze_queue_wait+0x36/0x80
? remove_wait_queue+0x60/0x60
blk_cleanup_queue+0x72/0x160
nvme_ns_remove+0x106/0x140 [nvme_core]
nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
process_one_work+0x160/0x350
worker_thread+0x1c3/0x3d0
kthread+0xf5/0x130
? process_one_work+0x350/0x350
? kthread_bind+0x10/0x10
ret_from_fork+0x1f/0x30

Extend nvmf_fail_nonready_command() to supply the controller pointer so
that the controller state can be looked at. Fail any io to a controller
that is deleting.

Fixes: 3bc32bb1186c ("nvme-fabrics: refactor queue ready check")
Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>


# 278ab379 11-Jun-2018 Christoph Hellwig <hch@lst.de>

nvme-fabrics: handle the admin-only case properly in nvmf_check_ready

In the ADMIN_ONLY state we don't have any I/O queues, but we should accept
all admin commands without further checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>


# 3bc32bb1 11-Jun-2018 Christoph Hellwig <hch@lst.de>

nvme-fabrics: refactor queue ready check

Move the is_connected check to the fibre channel transport, as it has no
meaning for other transports. To facilitate this split out a new
nvmf_fail_nonready_command helper that is called by the transport when
it is asked to handle a command on a queue that is not ready.

Also avoid a function call for the queue live fast path by inlining
the check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>


# 12a0b662 01-Jun-2018 Johannes Thumshirn <jthumshirn@suse.de>

nvme: don't hold nvmf_transports_rwsem for more than transport lookups

Only take nvmf_transports_rwsem when doing a lookup of registered
transports, so that a blocking ->create_ctrl doesn't prevent other
actions on /dev/nvme-fabrics.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
[hch: increased lock hold time a bit to be safe, added a comment
and updated the changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ab4f47a9 25-May-2018 James Smart <jsmart2021@gmail.com>

nvme: allow duplicate controller if prior controller being deleted

The current checks for whether a new controller request "matches" an
existing controller ignores controller state and checks identity strings.
There are cases where an existing controller may be in its last steps of
deletion when they are "matched" by a new connection.

Change the behavior so that the new connection ignores controllers that
are deleted.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# bb06ec31 12-Apr-2018 James Smart <jsmart2021@gmail.com>

nvme: expand nvmf_check_if_ready checks

The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted. When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests. As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ad6a0a52 31-Jan-2018 Max Gurtovoy <maxg@mellanox.com>

nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING

In pci transport, this state is used to mark the initialization
process. This should be also used in other transports as well.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 0de5cd36 25-Dec-2017 Roy Shterman <roys@lightbitslabs.com>

nvme-fabrics: protect against module unload during create_ctrl

NVMe transport driver module unload may (and usually does) trigger
iteration over the active controllers and delete them all (sometimes
under a mutex). However, a controller can be created concurrently with
module unload which can lead to leakage of resources (most important char
device node leakage) in case the controller creation occured after the
unload delete and drain sequence. To protect against this, we take a
module reference to guarantee that the nvme transport driver is not
unloaded while creating a controller.

Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 48832f8d 24-Oct-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: introduce init command check for a queue that is not alive

When the fabrics queue is not alive and fully functional, no commands
should be allowed to pass but connect (which moves the queue to a fully
functional state). Any other command should be failed, with either
temporary status BLK_STS_RESOUCE or permanent status BLK_STS_IOERR.

This is shared across all fabrics, hence move the check to fabrics
library.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 991231dc 20-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme: add helper to compare options to controller

Adds a helper function that compares the host and subsytem
specified in a connect options list vs a controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3b338762 20-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme: add duplicate_connect option

Add the "duplicate_connect" boolean option (presence means true).
Default is false.

When false, the transport should validate whether a new controller request
is targeted for the same host transport addressing and target transport
addressing as an existing controller. If so, the new controller request
should be rejected.

When true, the callee is explicitly requesting a duplicate controller
connection to be made and the new request should be attempted.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 180de007 25-Jun-2017 Christoph Hellwig <hch@lst.de>

nvme: read the subsystem NQN from Identify Controller

NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures. Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it. For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6bfe0425 20-Jun-2017 Johannes Thumshirn <jthumshirn@suse.de>

nvme: add hostid token to fabric options

Currently we have no way to define a stable host-id but always use the one
which is randomly generated when we add the host or use the default host.

Provide a "hostid=%s" for user-space to pass in a persistent host-id which
overrides the randomly generated one.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fdf9dfa8 04-May-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: move nr_reconnects to nvme_ctrl

It is not a user option but rather a variable controller
attribute.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8e412263 17-May-2017 Christoph Hellwig <hch@lst.de>

nvme: switch to uuid_t

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 42a45274 18-Mar-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: Allow ctrl loss timeout configuration

When a host sense that its controller session is damaged,
it tries to re-establish it periodically (reconnect every
reconnect_delay). It may very well be that the controller
is gone and never coming back, in this case the host will
try to reconnect forever.

Add a ctrl_loss_tmo to bound the number of reconnect attempts
to a specific controller (default to a reasonable 10 minutes).
The timeout configuration is actually translated into number of
reconnect attempts and not a schedule on its own but rather
divided with reconnect_delay. This is useful to prevent
racing flows of remove and reconnect, and it doesn't really
matter if we remove slightly sooner than what the user requested.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e5a39dd8 27-Jan-2017 Johannes Thumshirn <jthumshirn@suse.de>

nvme: make nvmf_register_transport require a create_ctrl callback

nvmf_create_ctrl() relys on the presence of a create_crtl callback in the
registered nvmf_transport_ops, so make nvmf_register_transport require one.

Update the available call-sites as well to reflect these changes.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 478bcb93 02-Aug-2016 James Smart <james.smart@broadcom.com>

nvme-fabrics: Add host_traddr options field to host infrastructure

Add the host_traddr field to allow specification of the host-port
connection info for the transport. Will be used by FC transport.

Signed-off-by: James Smart <james.smart@broadcom.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 4a9f05c5 02-Aug-2016 James Smart <james.smart@broadcom.com>

nvme-fabrics: revise host transport option descriptions

Revise some of the comments so not so ethernet-network centric

Signed-off-by: James Smart <james.smart@broadcom.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 7a665d2f 28-Jun-2016 Daniel Verkamp <daniel.verkamp@intel.com>

nvme-fabrics: change NQN UUID to big-endian format

NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the
UUID format of NQN uses a UUID based on RFC 4122.

RFC 4122 specifies that the UUID is encoded in big-endian byte order.

Switch the NVMe over Fabrics host ID field from little-endian UUID to
big-endian UUID to match the specification.

Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 6a92967c 22-Jun-2016 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: Remove tl_retry_count

The timeout before error recovery logic kicks in is
dictated by the nvme keep-alive, so we don't really need
a transport layer retry count. transports can retry for
as much as they like.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 038bd4cb 13-Jun-2016 Sagi Grimberg <sagi@grimberg.me>

nvme: add keep-alive support

Periodic keep-alive is a mandatory feature in NVMe over Fabrics, and
optional in NVMe 1.2.1 for PCIe. This patch adds periodic keep-alive
sent from the host to verify that the controller is still responsive
and vice-versa. The keep-alive timeout is user-defined (with
keep_alive_tmo connection parameter) and defaults to 5 seconds.

In order to avoid a race condition where the host sends a keep-alive
competing with the target side keep-alive timeout expiration, the host
adds a grace period of 10 seconds when publishing the keep-alive timeout
to the target.

In case a keep-alive failed (or timed out), a transport specific error
recovery kicks in.

For now only NVMe over Fabrics is wired up to support keep alive, but
we can add PCIe support easily once controllers actually supporting it
become available.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@chelsio.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 07bfcd09 13-Jun-2016 Christoph Hellwig <hch@lst.de>

nvme-fabrics: add a generic NVMe over Fabrics library

The NVMe over Fabrics library provides an interface for both transports
and the nvme core to handle fabrics specific commands and attributes
independent of the underlying transport.

In addition, the fabrics library adds a misc device interface that allow
actually creating a fabrics controller, as we can't just autodiscover
it like in the PCI case. The nvme-cli utility has been enhanced to use
this interface to support fabric connect and discovery.

Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>,
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>,
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>