History log of /linux-master/drivers/nvme/host/fc.c
Revision Date Author Comments
# 205fb5fa 04-Apr-2024 Daniel Wagner <dwagner@suse.de>

nvme-fc: rename free_ctrl callback to match name pattern

Rename nvme_fc_nvme_ctrl_freed to nvme_fc_free_ctrl to match the name
pattern for the callback.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# c3a846fe 29-Jan-2024 Nitin U. Yewale <nyewale@redhat.com>

nvme-fc: show hostnqn when connecting to fc target

Log hostnqn when connecting to nvme target.
As hostnqn could be changed, logging this information
in syslog at appropriate time may help in troubleshooting.

Signed-off-by: Nitin U. Yewale <nyewale@redhat.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 70fbfc47 31-Jan-2024 Daniel Wagner <dwagner@suse.de>

nvme-fc: do not wait in vain when unloading module

The module exit path has race between deleting all controllers and
freeing 'left over IDs'. To prevent double free a synchronization
between nvme_delete_ctrl and ida_destroy has been added by the initial
commit.

There is some logic around trying to prevent from hanging forever in
wait_for_completion, though it does not handling all cases. E.g.
blktests is able to reproduce the situation where the module unload
hangs forever.

If we completely rely on the cleanup code executed from the
nvme_delete_ctrl path, all IDs will be freed eventually. This makes
calling ida_destroy unnecessary. We only have to ensure that all
nvme_delete_ctrl code has been executed before we leave
nvme_fc_exit_module. This is done by flushing the nvme_delete_wq
workqueue.

While at it, remove the unused nvme_fc_wq workqueue too.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 0d150bef 31-Jan-2024 Caleb Sander <csander@purestorage.com>

nvme-fc: log human-readable opcode on timeout

The fc transport logs the opcode and fctype on command timeout.
This is sufficient information to identify the command issued,
but not very human-readable. Use the nvme_fabrics_opcode_str()
helper to also log the name of the command, as rdma and tcp already do.

Signed-off-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 92b0b0ff 23-Jan-2024 Chaitanya Kulkarni <kch@nvidia.com>

nvme: add module description to stop warnings

Add MODULE_DESCRIPTION() in order to remove warnings & get clean build:-

WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/host/nvme-core.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/host/nvme.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/host/nvme-fabrics.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/host/nvme-rdma.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/host/nvme-fc.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/nvme/host/nvme-tcp.o

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 5d51dc8d 18-Dec-2023 Keith Busch <kbusch@kernel.org>

nvme-fc: set numa_node after nvme_init_ctrl

nvme_init_ctrl() resets numa_node to NUMA_NO_NODE, so be sure to set the
desired value after that function call so it won't be overwritten.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# e5a4975c 19-Oct-2023 Justin Stitt <justinstitt@google.com>

nvme-fc: replace deprecated strncpy with strscpy

strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

Let's instead use strscpy() [2] as it guarantees NUL-termination on the
destination buffer.

Moreover, there is no need to use:

| min(FCNVME_ASSOC_HOSTNQN_LEN, NVMF_NQN_SIZE));

I imagine this was originally done to make sure the destination buffer
is NUL-terminated by ensuring we copy a number of bytes less than the
size of our destination, thus leaving some NUL-bytes at the end.

However, with strscpy(), we no longer need to do this and we can instead
opt for the more idiomatic strscpy() usage of:

| strscpy(dest, src, sizeof(dest))

Also, no NUL-padding is required as lsop is zero-allocated:

| lsop = kzalloc((sizeof(*lsop) +
| sizeof(*assoc_rqst) + sizeof(*assoc_acc) +
| ctrl->lport->ops->lsrqst_priv_sz), GFP_KERNEL);

... and assoc_rqst points to a field in lsop:

| assoc_rqst = (struct fcnvme_ls_cr_assoc_rqst *)&lsop[1];

Therefore, any additional NUL-byte assignments (like the ones that
strncpy() makes) are redundant.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Similar-to: https://lore.kernel.org/all/20231018-strncpy-drivers-nvme-host-fabrics-c-v1-1-b6677df40a35@google.com/
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20231019-strncpy-drivers-nvme-host-fc-c-v1-1-5805c15e4b49@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>


# d3e8b185 18-Dec-2023 Keith Busch <kbusch@kernel.org>

Revert "nvme-fc: fix race between error recovery and creating association"

The commit was identified to might sleep in invalid context and is
blocking regression testing.

This reverts commit ee6fdc5055e916b1dd497f11260d4901c4c1e55e.

Link: https://lore.kernel.org/linux-nvme/hkhl56n665uvc6t5d6h3wtx7utkcorw4xlwi7d2t2bnonavhe6@xaan6pu43ap6/
Link: https://lists.infradead.org/pipermail/linux-nvme/2023-December/043756.html
Reported-by: Daniel Wagner <dwagner@suse.de>
Reported-by: Maurizio Lombardi <mlombard@redhat.com>
Cc: Michael Liang <mliang@purestorage.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# e6e7f7ac 27-Oct-2023 Keith Busch <kbusch@kernel.org>

nvme: ensure reset state check ordering

A different CPU may be setting the ctrl->state value, so ensure proper
barriers to prevent optimizing to a stale state. Normally it isn't a
problem to observe the wrong state as it is merely advisory to take a
quicker path during initialization and error recovery, but seeing an old
state can report unexpected ENETRESET errors when a reset request was in
fact successful.

Reported-by: Minh Hoang <mh2022@meta.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>


# 3af755a4 21-Nov-2023 Hannes Reinecke <hare@suse.de>

nvme: move nvme_stop_keep_alive() back to original position

Stopping keep-alive not only stops the keep-alive workqueue,
but also needs to be synchronized with I/O termination as we
must not send a keep-alive command after all I/O had been
terminated.
So to avoid any regressions move the call to stop_keep_alive()
back to its original position and ensure that keep-alive is
correctly stopped failing to setup the admin queue.

Fixes: 4733b65d82bd ("nvme: start keep-alive after admin queue setup")
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 4733b65d 24-Oct-2023 Hannes Reinecke <hare@suse.de>

nvme: start keep-alive after admin queue setup

Setting up I/O queues might take quite some time on larger and/or
busy setups, so KATO might expire before all I/O queues could be
set up.
Fix this by start keep alive from the ->init_ctrl_finish() callback,
and stopping it when calling nvme_cancel_admin_tagset().

Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Mark O'Donovan <shiftee@posteo.net>
[fixed nvme-fc compile error]
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 8ae5b3a6 17-Aug-2023 Nigel Kirkland <nkirkland2304@gmail.com>

nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()

The nvme_fc_fcp_op structure describing an AEN operation is initialized with a
null request structure pointer. An FC LLDD may make a call to
nvme_fc_io_getuuid passing a pointer to an nvmefc_fcp_req for an AEN operation.

Add validation of the request structure pointer before dereference.

Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# ee6fdc50 07-Jul-2023 Michael Liang <mliang@purestorage.com>

nvme-fc: fix race between error recovery and creating association

There is a small race window between nvme-fc association creation and error
recovery. Fix this race condition by protecting accessing to controller
state and ASSOC_FAILED flag under nvme-fc controller lock.

Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 60e445bd 07-Jul-2023 Michael Liang <mliang@purestorage.com>

nvme-fc: return non-zero status code when fails to create association

Return non-zero status code(-EIO) when needed, so re-connecting or
deleting controller will be triggered properly.

Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# d67790dd 22-May-2023 Kees Cook <keescook@chromium.org>

overflow: Add struct_size_t() helper

While struct_size() is normally used in situations where the structure
type already has a pointer instance, there are places where no variable
is available. In the past, this has been worked around by using a typed
NULL first argument, but this is a bit ugly. Add a helper to do this,
and replace the handful of instances of the code pattern with it.

Instances were found with this Coccinelle script:

@struct_size_t@
identifier STRUCT, MEMBER;
expression COUNT;
@@

- struct_size((struct STRUCT *)\(0\|NULL\),
+ struct_size_t(struct STRUCT,
MEMBER, COUNT)

Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: HighPoint Linux Team <linux@highpoint-tech.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Cc: Don Brace <don.brace@microchip.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-scsi@vger.kernel.org
Cc: megaraidlinux.pdl@broadcom.com
Cc: storagedev@microchip.com
Cc: linux-xfs@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230522211810.never.421-kees@kernel.org


# 10a03c36 13-Mar-2023 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drivers: remove struct module * setting from struct class

There is no need to manually set the owner of a struct class, as the
registering function does it automatically, so remove all of the
explicit settings from various drivers that did so as it is unneeded.

This allows us to remove this pointer entirely from this structure going
forward.

Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230313181843.1207845-2-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 98e35280 20-Jan-2023 Ross Lagerwall <ross.lagerwall@citrix.com>

nvme-fc: fix initialization order

ctrl->ops is used by nvme_alloc_admin_tag_set() but set by
nvme_init_ctrl() so reorder the calls to avoid a NULL pointer
dereference.

Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# db45e1a5 30-Nov-2022 Christoph Hellwig <hch@lst.de>

nvme: consolidate setting the tagset flags

All nvme transports should be using the same flags for their tagsets,
with the exception for the blocking flag that should only be set for
transports that can block in ->queue_rq.

Add a NVME_F_BLOCKING flag to nvme_ctrl_ops to control the blocking
behavior and lift setting the flags into nvme_alloc_{admin,io}_tag_set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# dcef7727 30-Nov-2022 Christoph Hellwig <hch@lst.de>

nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set

Don't look at ctrl->ops as only RDMA and TCP actually support multiple
maps.

Fixes: 6dfba1c09c10 ("nvme-fc: use the tagset alloc/free helpers")
Fixes: ceee1953f923 ("nvme-loop: use the tagset alloc/free helpers")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# 285b6e9b 08-Nov-2022 Christoph Hellwig <hch@lst.de>

nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl

Many of the callers decide which one to use based on a bool argument and
there is at least some code to be shared, so merge these two. Also
move a comment specific to a single callsite to that callsite.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hector Martin <marcan@marcan.st>


# b2969585 30-Nov-2022 Chaitanya Kulkarni <kch@nvidia.com>

nvme-fc: move common code into helper

Add a helper to move the duplicate code for error message
from nvme_fc_rcv_ls_req() to nvme_fc_rcv_ls_req_err_msg().

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6c90294d 30-Nov-2022 Chaitanya Kulkarni <kch@nvidia.com>

nvme-fc: avoid null pointer dereference

Before using dynamically allcoated variable lsop in the
nvme_fc_rcv_ls_req(), add a check for NULL and error out early.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6887fc64 02-Oct-2022 Sagi Grimberg <sagi@grimberg.me>

nvme: introduce nvme_start_request

In preparation for nvme-multipath IO stats accounting, we want the
accounting to happen in a centralized place. The request completion
is already centralized, but we need a common helper to request I/O
start.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>


# 9f27bd70 15-Nov-2022 Christoph Hellwig <hch@lst.de>

nvme: rename the queue quiescing helpers

Naming the nvme helpers that wrap the block quiesce functionality
_start/_stop is rather confusing. Switch to using the quiesce naming
used by the block layer instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# 94cc781f 08-Nov-2022 Christoph Hellwig <hch@lst.de>

nvme: move OPAL setup from PCIe to core

Nothing about the TCG Opal support is PCIe transport specific, so move it
to the core code. For this nvme_init_ctrl_finish grows a new
was_suspended argument that allows the transport driver to tell the OPAL
code if the controller came out of a suspend cycle.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Tested-by Gerd Bayer <gbayer@linxu.ibm.com>


# cf3d00840 02-Oct-2022 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

nvme-fc: improve memory usage in nvme_fc_rcv_ls_req()

sizeof( struct nvmefc_ls_rcv_op ) = 64
sizeof( union nvmefc_ls_requests ) = 1024
sizeof( union nvmefc_ls_responses ) = 128

So, in nvme_fc_rcv_ls_req(), 1216 bytes of memory are requested when
kzalloc() is called.

Because of the way memory allocations are performed, 2048 bytes are
allocated. So about 800 bytes are wasted for each request.

Switch to 3 distinct memory allocations, in order to:
- save these 800 bytes
- avoid zeroing this extra memory
- make sure that memory is properly aligned in case of DMA access
("fc_dma_map_single(lsop->rspbuf)" just a few lines below)

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6dfba1c0 20-Sep-2022 Christoph Hellwig <hch@lst.de>

nvme-fc: use the tagset alloc/free helpers

Use the common helpers to allocate and free the tagsets. To make this
work the generic nvme_ctrl now needs to be stored in the hctx private
data instead of the nvme_fc_ctrl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>


# 1864ea46 20-Sep-2022 Christoph Hellwig <hch@lst.de>

nvme-fc: store the generic nvme_ctrl in set->driver_data

Point the private data to the generic controller structure in preparation
of using the common tagset init/exit code and use the chance the cleanup
the init_hctx methods a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>


# 18ecd975 20-Sep-2022 Christoph Hellwig <hch@lst.de>

nvme-fc: keep ctrl->sqsize in sync with opts->queue_size

Also update the sqsize field when capping the queue size, and remove the
check a queue size that is larger than sqsize given that sqsize is only
initialized from opts->queue_size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>


# a4e1d0b7 15-Aug-2022 Bart Van Assche <bvanassche@acm.org>

block: Change the return type of blk_mq_map_queues() into void

Since blk_mq_map_queues() and the .map_queues() callbacks always return 0,
change their return type into void. Most callers ignore the returned value
anyway.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org
[axboe: fold in fix from Bart]
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9317d001 06-Aug-2022 Christoph Hellwig <hch@lst.de>

nvme-fc: fix the fc_appid_store return value

"nvme-fc: fold t fc_update_appid into fc_appid_store" accidentally
changed the userspace interface for the appid attribute, because the code
that decrements "count" to remove a trailing '\n' in the parsing results
in the decremented value being incorrectly be returned from the sysfs
write. Fix this by keeping an orig_count variable for the full length
of the write.

Fixes: c814153c83a8 ("nvme-fc: fold t fc_update_appid into fc_appid_store")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Tested-by: Muneendra Kumar M <muneendra.kumar@broadcom.com>


# 6fb271f1 20-Jul-2022 Ming Lei <ming.lei@redhat.com>

nvme-fc: restart admin queue if the caller needs to restart queue

Without restarting admin queue in __nvme_fc_abort_outstanding_ios(),
it leaves controller not capable of handling admin pt request, and
causes io hang.

Fixes it by restarting admin queue if the caller of __nvme_fc_abort_outstanding_ios
requires to restart queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 2dd6532e 06-Jul-2022 John Garry <john.garry@huawei.com>

blk-mq: Drop 'reserved' arg of busy_tag_iter_fn

We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter
function so it may be dropped.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9bdb4833 06-Jul-2022 John Garry <john.garry@huawei.com>

blk-mq: Drop blk_mq_ops.timeout 'reserved' arg

With new API blk_mq_is_reserved_rq() we can tell if a request is from
the reserved pool, so stop passing 'reserved' arg. There is actually
only a single user of that arg for all the callback implementations, which
can use blk_mq_is_reserved_rq() instead.

This will also allow us to stop passing the same 'reserved' around the
blk-mq iter functions next.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6f8191fd 19-Jun-2022 Christoph Hellwig <hch@lst.de>

block: simplify disk shutdown

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 827fc630 19-May-2022 Muneendra Kumar <muneendra.kumar@broadcom.com>

scsi: nvme-fc: Add new routine nvme_fc_io_getuuid()

Add nvme_fc_io_getuuid() to the nvme-fc transport. The routine is invoked
by the FC LLDD on a per-I/O request basis. The routine translates from the
FC-specific request structure to the bio and the cgroup structure in order
to obtain the FC appid stored in the cgroup structure. If a value is not
set or a bio is not found, a NULL appid (aka uuid) will be returned to the
LLDD.

Link: https://lore.kernel.org/r/20220519123110.17361-2-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# c814153c 19-Apr-2022 Christoph Hellwig <hch@lst.de>

nvme-fc: fold t fc_update_appid into fc_appid_store

No need for this wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 55d7baa3 19-Apr-2022 Christoph Hellwig <hch@lst.de>

nvme-fc: don't support the appid attribute without CONFIG_BLK_CGROUP_FC_APPID

nvme-fc appid support needs CONFIG_BLK_CGROUP_FC_APPID to work, so disable
the whole code if the option is not set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 72e8b5cd 10-Feb-2022 Chaitanya Kulkarni <kch@nvidia.com>

nvme: add a helper to initialize connect_q

Add and use helper to remove duplicate code for fabrics connect_q
initialization and error handling for all the transports.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3dd83f40 14-Feb-2022 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: replace ida_simple[get|remove] with the simler ida_[alloc|free]

ida_simple_[get|remove] are wrappers anyways.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e5ea42fa 22-Sep-2021 Hannes Reinecke <hare@suse.de>

nvme: display correct subsystem NQN

With discovery controllers supporting unique subsystem NQNs the
actual subsystem NQN might be different from that one passed in
via the connect args. So add a helper to display the resulting
subsystem NQN.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 01d83816 23-Aug-2021 Saurav Kashyap <skashyap@marvell.com>

nvme-fc: add support for ->map_queues

NVMe FC don't have support for map queues, unlike the PCI, RDMA and TCP
transports. Add a ->map_queues callout for the LLDDs to provide such
functionality.

Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6ca1d902 14-Oct-2021 Ming Lei <ming.lei@redhat.com>

nvme: apply nvme API to quiesce/unquiesce admin queue

Apply the added two APIs to quiesce/unquiesce admin queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014081710.1871747-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bdaa1365 14-Sep-2021 James Smart <jsmart2021@gmail.com>

nvme-fc: remove freeze/unfreeze around update_nr_hw_queues

Remove the freeze/unfreeze around changes to the number of hardware
queues. Study and retest has indicated there are no ios that can be
active at this point so there is nothing to freeze.

nvme-fc is draining the queues in the shutdown and error recovery path
in __nvme_fc_abort_outstanding_ios.

This patch primarily reverts 88e837ed0f1f "nvme-fc: wait for queues to
freeze before calling update_hr_hw_queues". It's not an exact revert as
it leaves the adjusting of hw queues only if the count changes.

Signed-off-by: James Smart <jsmart2021@gmail.com>
[dwagner: added explanation why no IO is pending]
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e5445dae 14-Sep-2021 James Smart <jsmart2021@gmail.com>

nvme-fc: avoid race between time out and tear down

To avoid race between time out and tear down, in tear down process,
first we quiesce the queue, and then delete the timer and cancel
the time out work for the queue.

This patch merges the admin and io sync ops into the queue teardown logic
as shown in the RDMA patch 3017013dcc "nvme-rdma: avoid race between time
out and tear down". There is no teardown_lock in nvme-fc.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 555f66d0 14-Sep-2021 Daniel Wagner <dwagner@suse.de>

nvme-fc: update hardware queues before using them

In case the number of hardware queues changes, we need to update the
tagset and the mapping of ctx to hctx first.

If we try to create and connect the I/O queues first, this operation
will fail (target will reject the connect call due to the wrong number
of queues) and hence we bail out of the recreate function. Then we
will to try the very same operation again, thus we don't make any
progress.

Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# be42a33b 10-Jun-2021 Keith Busch <kbusch@kernel.org>

nvme: use blk_execute_rq() for passthrough commands

The generic blk_execute_rq() knows how to handle polled completions. Use
that instead of implementing an nvme specific handler.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210610214437.641245-3-kbusch@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3dbbca75 07-Jun-2021 Muneendra Kumar <muneendra.kumar@broadcom.com>

scsi: nvme: Added a new sysfs attribute appid_store

Add a new sysfs attribute, appid_store, which can be used to set the
application identifier in the blkcg associated with a cgroup id.

Below is the interface provided to set the app_id:

echo "<cgroupid>:<appid>" >> /sys/class/fc/fc_udev_device/appid_store

echo "457E:100000109b521d27" >> /sys/class/fc/fc_udev_device/appid_store

Link: https://lore.kernel.org/r/20210608043556.274139-4-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# b61678bc 09-Jun-2021 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme-fc: use ctrl sgl check helper

Use the helper to check NVMe controller's SGL support.

Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f25f8ef7 21-May-2021 Hannes Reinecke <hare@suse.de>

nvme-fc: short-circuit reconnect retries

Returning an nvme status from nvme_fc_create_association() indicates
that the association is established, and we should honour the DNR bit.
If it's set a reconnect attempt will just return the same error, so
we can short-circuit the reconnect attempts and fail the connection
directly.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# a7d13914 10-May-2021 James Smart <jsmart2021@gmail.com>

nvme-fc: clear q_live at beginning of association teardown

The __nvmf_check_ready() routine used to bounce all filesystem io if the
controller state isn't LIVE. However, a later patch changed the logic so
that it rejection ends up being based on the Q live check. The FC
transport has a slightly different sequence from rdma and tcp for
shutting down queues/marking them non-live. FC marks its queue non-live
after aborting all ios and waiting for their termination, leaving a
rather large window for filesystem io to continue to hit the transport.
Unfortunately this resulted in filesystem I/O or applications seeing I/O
errors.

Change the FC transport to mark the queues non-live at the first sign of
teardown for the association (when I/O is initially terminated).

Fixes: 73a5379937ec ("nvme-fabrics: allow to queue requests for live queues")
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# a9715744 25-Apr-2021 Tao Chiu <taochiu@synology.com>

nvme: move the fabrics queue ready check routines to core

queue_rq() in pci only checks if the dispatched queue (nvmeq) is ready,
e.g. not being suspended. Since nvme_alloc_admin_tags() in reset flow
restarts the admin queue, users are able to submit admin commands to a
controller before reset_work() completes. Commands submitted under this
condition may interfere with commands that performs identify, IO queue
setup in reset_work(), and may result in a hang described in the
following patch.

As seen in the fabrics, user commands are prevented from being executed
under inproper controller states. We may reuse this logic to maintain a
clear admin queue during reset_work().

Signed-off-by: Tao Chiu <taochiu@synology.com>
Signed-off-by: Cody Wong <codywong@synology.com>
Reviewed-by: Leon Chien <leonchien@synology.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8df1bff5 30-Mar-2021 Max Gurtovoy <mgurtovoy@nvidia.com>

nvme-fc: check sgl supported by target

SGLs support is mandatory for NVMe/FC, make sure that the target is
aligned to the specification.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f4b9e6c9 17-Mar-2021 Keith Busch <kbusch@kernel.org>

nvme: use driver pdu command for passthrough

All nvme transport drivers preallocate an nvme command for each request.
Assume to use that command for nvme_setup_cmd() instead of requiring
drivers pass a pointer to it. All nvme drivers must initialize the
generic nvme_request 'cmd' to point to the transport's preallocated
nvme_command.

The generic nvme_request cmd pointer had previously been used only as a
temporary copy for passthrough commands. Since it now points to the
command that gets dispatched, passthrough commands must directly set it
up prior to executing the request.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 2afc4866 28-Feb-2021 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme-fc: fix the function documentation comment

The nvme_fc_rcv_ls_req() function has first argument as pointer to
remoteport named portprt, but in the documentation comment that is name
is used as remoteport. Fix that to get rid if the compilation warning.

drivers/nvme//host/fc.c:1724: warning: Function parameter or member 'portptr' not described in 'nvme_fc_rcv_ls_req'
drivers/nvme//host/fc.c:1724: warning: Excess function parameter 'remoteport' description in 'nvme_fc_rcv_ls_req'

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f21c4769 28-Feb-2021 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme: rename nvme_init_identify()

This is a prep patch so that we can move the identify data structure
related code initialization from nvme_init_identify() into a helper.

Rename the function nvmet_init_identify() to nvmet_init_ctrl_finish().

Next patch will move the nvme_id_ctrl related initialization from newly
renamed function nvme_init_ctrl_finish() into the nvme_init_identify()
helper.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ed01fee2 03-Mar-2021 Christoph Hellwig <hch@lst.de>

nvme-fabrics: only reserve a single tag

Fabrics drivers currently reserve two tags on the admin queue. But
given that the connect command is only run on a freshly created queue
or after all commands have been force aborted we only need to reserve
a single tag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>


# f20ef34d 08-Mar-2021 James Smart <jsmart2021@gmail.com>

nvme-fc: fix racing controller reset and create association

Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in
interrupt context results in a possible race condition. A controller
reset results in errored io completions, which schedules error
work. The change of error work to a work element allows it to fire
after the ctrl state transition to NVME_CTRL_CONNECTING, causing
any outstanding io (used to initialize the controller) to fail and
cause problems for connect_work.

Add a state check to only schedule error work if not in the RESETTING
state.

Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context")
Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ae3afe63 26-Feb-2021 Hannes Reinecke <hare@suse.de>

nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted

When a command has been aborted we should return NVME_SC_HOST_ABORTED_CMD
to be consistent with the other transports.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3c7aafbc 26-Feb-2021 Hannes Reinecke <hare@suse.de>

nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange()

nvme_fc_terminate_exchange() is being called when exchanges are
being deleted, and as such we should be setting the NVME_REQ_CANCELLED
flag to have identical behaviour on all transports.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 60b152a5 08-Jan-2021 Rikard Falkeborn <rikard.falkeborn@gmail.com>

nvme: constify static attribute_group structs

The only usage of these is to put their addresses in arrays of pointers
to const attribute_groups. Make them const to allow the compiler to put
them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 19fce047 01-Dec-2020 James Smart <james.smart@broadcom.com>

nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context

Recent patches changed calling sequences. nvme_fc_abort_outstanding_ios
used to be called from a timeout or work context. Now it is being called
in an io completion context, which can be an interrupt handler.
Unfortunately, the abort outstanding ios routine attempts to stop nvme
queues and nested routines that may try to sleep, which is in conflict
with the interrupt handler.

Correct replacing the direct call with a work element scheduling, and the
abort outstanding ios routine will be called in the work element.

Fixes: 95ced8a2c72d ("nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery")
Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Daniel Wagner <dwagner@suse.de>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# dc96f938 09-Nov-2020 Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

nvme: use consistent macro name for timeout

This is purely a clenaup patch, add prefix NVME to the ADMIN_TIMEOUT to
make consistent with NVME_IO_TIMEOUT.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ac9b820e 23-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: remove nvme_fc_terminate_io()

__nvme_fc_terminate_io() is now called by only 1 place, in reset_work.
Consoldate and move the functionality of terminate_io into reset_work.

In reset_work, rather than calling the create_association directly,
schedule the connect work element to do its thing. After scheduling,
flush the connect work element to continue with semantic of not
returning until connect has been attempted at least once.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 95ced8a2 23-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: eliminate terminate_io use by nvme_fc_error_recovery

nvme_fc_error_recovery() special cases handling when in CONNECTING state
and calls __nvme_fc_terminate_io(). __nvme_fc_terminate_io() itself
special cases CONNECTING state and calls the routine to abort outstanding
ios.

Simplify the sequence by putting the call to abort outstanding I/Os
directly in nvme_fc_error_recovery.

Move the location of __nvme_fc_abort_outstanding_ios(), and
nvme_fc_terminate_exchange() which is called by it, to avoid adding
function prototypes for nvme_fc_error_recovery().

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9c2bb257 23-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: remove err_work work item

err_work was created to handle errors (mainly I/O timeouts) while in
CONNECTING state. The flag for err_work_active is also unneeded.

Remove err_work_active and err_work. The actions to abort I/Os are moved
inline to nvme_error_recovery().

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# caf1cbe3 23-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: track error_recovery while connecting

Whenever there are errors during CONNECTING, the driver recovers by
aborting all outstanding ios and counts on the io completion to fail them
and thus the connection/association they are on. However, the connection
failure depends on a failure state from the core routines. Not all
commands that are issued by the core routine are guaranteed to cause a
failure of the core routine. They may be treated as a failure status and
the status is then ignored.

As such, whenever the transport enters error_recovery while CONNECTING,
it will set a new flag indicating an association failed. The
create_association routine which creates and initializes the controller,
will monitor the state of the flag as well as the core routine error
status and ensure the association fails if there was an error.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f673714a 16-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: shorten reconnect delay if possible for FC

We've had several complaints about a 10s reconnect delay (the default)
when there was an error while there is connectivity to a subsystem.
The max_reconnects and reconnect_delay are set in common code prior to
calling the transport to create the controller.

This change checks if the default reconnect delay is being used, and if
so, it adjusts it to a shorter period (2s) for the nvme-fc transport.
It does so by calculating the controller loss tmo window, changing the
value of the reconnect delay, and then recalculating the maximum number
of reconnect attempts allowed.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 88e837ed 16-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: wait for queues to freeze before calling update_hr_hw_queues

On reconnect, the code currently does not freeze the controller before
possibly updating the number hw queues for the controller.

Add the freeze before updating the number of hw queues. Note: the queues
are already started and remain started through the reconnect.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 514a6dc9 16-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: fix error loop in create_hw_io_queues

The loop that backs out of hw io queue creation continues through index
0, which corresponds to the admin queue as well.

Fix the loop so it only proceeds through indexes 1..n which correspond to
I/O queues.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 52793d62 16-Oct-2020 James Smart <james.smart@broadcom.com>

nvme-fc: fix io timeout to abort I/O

Currently, an I/O timeout unconditionally invokes
nvme_fc_error_recovery() which checks for LIVE or CONNECTING state. If
live, the routine resets the controller which initiates a reconnect -
which is valid. If CONNECTING, err_work is scheduled. Err_work then
calls the terminate_io routine, which also checks for CONNECTING and
noops any further action on outstanding I/O. The result is nothing
happened to the timed out io. As such, if the command was dropped on
the wire, it will never timeout / complete, and the connect process
will hang.

Change the behavior of the io timeout routine to unconditionally abort
the I/O. I/O completion handling will note that an io failed due to an
abort and will terminate the connection / association as needed. If the
abort was unable to happen, continue with a call to
nvme_fc_error_recovery(). To ensure something different happens in
nvme_fc_error_recovery() rework it so at it will abort all I/Os on the
association to force a failure.

As I/O aborts now may occur outside of delete_association, counting for
completion must be wary and only count those aborted during
delete_association when TERMIO is set on the controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9e0e8dac 17-Sep-2020 James Smart <james.smart@broadcom.com>

nvme-fc: fail new connections to a deleted host or remote port

The lldd may have made calls to delete a remote port or local port and
the delete is in progress when the cli then attempts to create a new
controller. Currently, this proceeds without error although it can't be
very successful.

Fix this by validating that both the host port and remote port are
present when a new controller is to be created.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e126e821 02-Sep-2020 David Milburn <dmilburn@redhat.com>

nvme-fc: cancel async events before freeing event struct

Cancel async event work in case async event has been queued up, and
nvme_fc_submit_async_event() runs after event has been freed.

Signed-off-by: David Milburn <dmilburn@redhat.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 2eb81a33 18-Aug-2020 Christoph Hellwig <hch@lst.de>

nvme: rename and document nvme_end_request

nvme_end_request is a bit misnamed, as it wraps around the
blk_mq_complete_* API. It's semantics also are non-trivial, so give it
a more descriptive name and add a comment explaining the semantics.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f34448cd 02-Aug-2020 Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

nvme-fc: Fix wrong return value in __nvme_fc_init_request()

On an error exit path, a negative error code should be returned
instead of a positive return value.

Fixes: e399441de9115 ("nvme-fabrics: Add host support for FC transport")
Cc: James Smart <jsmart2021@gmail.com>
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 23748076 14-Jul-2020 James Smart <jsmart2021@gmail.com>

nvme-fc: set max_segments to lldd max value

Currently the FC transport is set max_hw_sectors based on the lldds
max sgl segment count. However, the block queue max segments is
set based on the controller's max_segments count, which the transport
does not set. As such, the lldd is receiving sgl lists that are
exceeding its max segment count.

Set the controller max segment count and derive max_hw_sectors from
the max segment count.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ecca390e 22-Jul-2020 Sagi Grimberg <sagi@grimberg.me>

nvme: fix deadlock in disconnect during scan_work and/or ana_work

A deadlock happens in the following scenario with multipath:
1) scan_work(nvme0) detects a new nsid while nvme0
is an optimized path to it, path nvme1 happens to be
inaccessible.

2) Before scan_work is complete nvme0 disconnect is initiated
nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING

3) scan_work(1) attempts to submit IO,
but nvme_path_is_optimized() observes nvme0 is not LIVE.
Since nvme1 is a possible path IO is requeued and scan_work hangs.

--
Workqueue: nvme-wq nvme_scan_work [nvme_core]
kernel: Call Trace:
kernel: __schedule+0x2b9/0x6c0
kernel: schedule+0x42/0xb0
kernel: io_schedule+0x16/0x40
kernel: do_read_cache_page+0x438/0x830
kernel: read_cache_page+0x12/0x20
kernel: read_dev_sector+0x27/0xc0
kernel: read_lba+0xc1/0x220
kernel: efi_partition+0x1e6/0x708
kernel: check_partition+0x154/0x244
kernel: rescan_partitions+0xae/0x280
kernel: __blkdev_get+0x40f/0x560
kernel: blkdev_get+0x3d/0x140
kernel: __device_add_disk+0x388/0x480
kernel: device_add_disk+0x13/0x20
kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core]
kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core]
kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core]
kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core]
kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core]
kernel: nvme_validate_ns+0x396/0x940 [nvme_core]
kernel: nvme_scan_work+0x24f/0x380 [nvme_core]
kernel: process_one_work+0x1db/0x380
kernel: worker_thread+0x249/0x400
kernel: kthread+0x104/0x140
--

4) Delete also hangs in flush_work(ctrl->scan_work)
from nvme_remove_namespaces().

Similiarly a deadlock with ana_work may happen: if ana_work has started
and calls nvme_mpath_set_live and device_add_disk, it will
trigger I/O. When we trigger disconnect I/O will block because
our accessible (optimized) path is disconnecting, but the alternate
path is inaccessible, so I/O blocks. Then disconnect tries to flush
the ana_work and hangs.

[ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core]
[ 605.552087] Call Trace:
[ 605.552683] __schedule+0x2b9/0x6c0
[ 605.553507] schedule+0x42/0xb0
[ 605.554201] io_schedule+0x16/0x40
[ 605.555012] do_read_cache_page+0x438/0x830
[ 605.556925] read_cache_page+0x12/0x20
[ 605.557757] read_dev_sector+0x27/0xc0
[ 605.558587] amiga_partition+0x4d/0x4c5
[ 605.561278] check_partition+0x154/0x244
[ 605.562138] rescan_partitions+0xae/0x280
[ 605.563076] __blkdev_get+0x40f/0x560
[ 605.563830] blkdev_get+0x3d/0x140
[ 605.564500] __device_add_disk+0x388/0x480
[ 605.565316] device_add_disk+0x13/0x20
[ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core]
[ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core]
[ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core]
[ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core]
[ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core]
[ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core]
[ 605.573330] process_one_work+0x1db/0x380
[ 605.574144] worker_thread+0x4d/0x400
[ 605.574896] kthread+0x104/0x140
[ 605.577205] ret_from_fork+0x35/0x40
[ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds.
[ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830
[ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 605.582320] nvme D 0 14044 14043 0x00000000
[ 605.583424] Call Trace:
[ 605.583935] __schedule+0x2b9/0x6c0
[ 605.584625] schedule+0x42/0xb0
[ 605.585290] schedule_timeout+0x203/0x2f0
[ 605.588493] wait_for_completion+0xb1/0x120
[ 605.590066] __flush_work+0x123/0x1d0
[ 605.591758] __cancel_work_timer+0x10e/0x190
[ 605.593542] cancel_work_sync+0x10/0x20
[ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core]
[ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core]
[ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core]
[ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core]
[ 605.598320] dev_attr_store+0x17/0x30

Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will
indicate the phase of controller deletion where I/O cannot be allowed
to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to
be issued to the bottom device, and only after we flush the ana_work
and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces)
we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work
from re-firing by aborting early if we are not LIVE, so we should be safe
here.

In addition, change the transport drivers to follow the updated state
machine.

Fixes: 0d0b660f214d ("nvme: add ANA support")
Reported-by: Anton Eidelman <anton@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ff029451 11-Jun-2020 Christoph Hellwig <hch@lst.de>

nvme: use blk_mq_complete_request_remote to avoid an indirect function call

Use the new blk_mq_complete_request_remote helper to avoid an indirect
function call in the completion fast path.

Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c9c12e51 29-May-2020 Daniel Wagner <dwagner@suse.de>

nvme-fc: don't call nvme_cleanup_cmd() for AENs

Asynchronous event notifications do not have an associated request.
When fcp_io() fails we unconditionally call nvme_cleanup_cmd() which
leads to a crash.

Fixes: 16686f3a6c3c ("nvme: move common call to nvme_cleanup_cmd to core layer")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <hmadhani2024@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f1e71d75 07-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

nvme: replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 614fc1c0 12-May-2020 Martin George <marting@netapp.com>

nvme-fc: print proper nvme-fc devloss_tmo value

The nvme-fc devloss_tmo is computed as the min of either the
ctrl_loss_tmo (max_retries * reconnect_delay) or the remote port's
devloss_tmo. But what gets printed as the nvme-fc devloss_tmo in
nvme_fc_reconnect_or_delete() is always the remote port's devloss_tmo
value. So correct this by printing the min value instead.

Signed-off-by: Martin George <marting@netapp.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3add1d93 30-Apr-2020 Arnd Bergmann <arnd@arndb.de>

nvme-fc: avoid gcc-10 zero-length-bounds warning

When CONFIG_ARCH_NO_SG_CHAIN is set, op->sgl[0] cannot be dereferenced,
as gcc-10 now points out:

drivers/nvme/host/fc.c: In function 'nvme_fc_init_request':
drivers/nvme/host/fc.c:1774:29: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct scatterlist[0]' [-Wzero-length-bounds]
1774 | op->op.fcp_req.first_sgl = &op->sgl[0];
| ^~~~~~~~~~~
drivers/nvme/host/fc.c:98:21: note: while referencing 'sgl'
98 | struct scatterlist sgl[NVME_INLINE_SG_CNT];
| ^~~

I don't know if this is a legitimate warning or a false-positive.
If this is just a false alarm, the warning is easily suppressed
by interpreting the array as a pointer.

Fixes: b1ae1a238900 ("nvme-fc: Avoid preallocating big SGL for data")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 14fd1e98 31-Mar-2020 James Smart <jsmart2021@gmail.com>

nvme-fc: Add Disconnect Association Rcv support

The nvme-fc host transport did not support the reception of a
FC-NVME LS. Reception is necessary to implement full compliance
with FC-NVME-2.

Populate the LS receive handler, and specifically the handling
of a Disconnect Association LS. The response to the LS, if it
matched a controller, must be sent after the aborts for any
I/O on any connection have been sent.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fd5a5f22 31-Mar-2020 James Smart <jsmart2021@gmail.com>

nvme-fc: Update header and host for common definitions for LS handling

Given that both host and target now generate and receive LS's create
a single table definition for LS names. Each tranport half will have
a local version of the table.

As Create Association LS is issued by both sides, and received by
both sides, create common routines to format the LS and to validate
the LS.

Convert the host side transport to use the new common Create
Association LS formatting routine.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# eb4ee8f1 31-Mar-2020 James Smart <jsmart2021@gmail.com>

nvme-fc: convert assoc_active flag to bit op

Convert the assoc_active boolean flag to a bitop on the flags field.
The bit ops will provide atomicity.

To make this change, the flags field was converted to a long type,
which also affects the FCCTRL_TERMIO flag. Both FCCTRL_TERMIO and
now ASSOC_ACTIVE flags are set/cleared by bit operations.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f56bf76f 31-Mar-2020 James Smart <jsmart2021@gmail.com>

nvme-fc: Ensure private pointers are NULL if no data

Ensure that when allocations are done, and the lldd options indicate
no private data is needed, that private pointers will be set to NULL
(catches driver error that forgot to set private data size).

Slightly reorg the allocations so that private data follows allocations
for LS request/response buffers. Ensures better alignments for the buffers
as well as the private pointer.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ca19bcd0 31-Mar-2020 James Smart <jsmart2021@gmail.com>

nvme-fc nvmet-fc: refactor for common LS definitions

Routines in the target will want to be used in the host as well.
Error definitions should now shared as both sides will process
requests and responses to requests.

Moved common declarations to new fc.h header kept in the host
subdirectory.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 72e6329f 31-Mar-2020 James Smart <jsmart2021@gmail.com>

nvme-fc and nvmet-fc: revise LLDD api for LS reception and LS request

The current LLDD api has:
nvme-fc: contains api for transport to do LS requests (and aborts of
them). However, there is no interface for reception of LS's and sending
responses for them.
nvmet-fc: contains api for transport to do reception of LS's and sending
of responses for them. However, there is no interface for doing LS
requests.

Revise the api's so that both nvme-fc and nvmet-fc can send LS's, as well
as receiving LS's and sending their responses.

Change name of the rcv_ls_req struct to better reflect generic use as
a context to used to send an ls rsp. Specifically:
nvmefc_tgt_ls_req -> nvmefc_ls_rsp
nvmefc_tgt_ls_req.nvmet_fc_private -> nvmefc_ls_rsp.nvme_fc_private

Change nvmet_fc_rcv_ls_req() calling sequence to provide handle that
can be used by transport in later LS request sequences for an association.

nvme-fc nvmet_fc nvme_fcloop:
Revise to adapt to changed names in api header.
Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle.
Add stubs for new interfaces:
host/fc.c: nvme_fc_rcv_ls_req()
target/fc.c: nvmet_fc_invalidate_host()

lpfc:
Revise to adapt code to changed names in api header.
Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8c5c6605 03-Apr-2020 James Smart <jsmart2021@gmail.com>

nvme-fc: Revert "add module to ops template to allow module references"

The original patch was to resolve the lldd being able to be unloaded
while being used to talk to the boot device of the system. However, the
end result of the original patch is that any driver unload while a nvme
controller is live via the lldd is now being prohibited. Given the module
reference, the module teardown routine can't be called, thus there's no
way, other than manual actions to terminate the controllers.

Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references")
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 726612b6 24-Mar-2020 Israel Rukshin <israelr@mellanox.com>

nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl

Put the ctrl reference count at nvme_uninit_ctrl as opposed to
nvme_init_ctrl which takes it. This decrease the reference count at the
core layer instead of decreasing it on each transport separately.
Also move the call of nvme_uninit_ctrl at PCI driver after calling to
nvme_release_prp_pools and nvme_dev_unmap, in order to put the reference
count after using the dev. This is safe because those functions use
nvme_dev which is freed only later at nvme_pci_free_ctrl.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# b780d741 24-Mar-2020 Israel Rukshin <israelr@mellanox.com>

nvme: Fix ctrl use-after-free during sysfs deletion

In case nvme_sysfs_delete() is called by the user before taking the ctrl
reference count, the ctrl may be freed during the creation and cause the
bug. Take the reference as soon as the controller is externally visible,
which is done by cdev_device_add() in nvme_init_ctrl(). Also take the
reference count at the core layer instead of taking it on each transport
separately.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# c869e494 21-Nov-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: fix double-free scenarios on hw queues

If an error occurs on one of the ios used for creating an
association, the creating routine has error paths that are
invoked by the command failure and the error paths will free
up the controller resources created to that point.

But... the io was ultimately determined by an asynchronous
completion routine that detected the error and which
unconditionally invokes the error_recovery path which calls
delete_association. Delete association deletes all outstanding
io then tears down the controller resources. So the
create_association thread can be running in parallel with
the error_recovery thread. What was seen was the LLDD received
a call to delete a queue, causing the LLDD to do a free of a
resource, then the transport called the delete queue again
causing the driver to repeat the free call. The second free
routine corrupted the allocator. The transport shouldn't be
making the duplicate call, and the delete queue is just one
of the resources being freed.

To fix, it is realized that the create_association path is
completely serialized with one command at a time. So the
failed io completion will always be seen by the create_association
path and as of the failure, there are no ios to terminate and there
is no reason to be manipulating queue freeze states, etc.
The serialized condition stays true until the controller is
transitioned to the LIVE state. Thus the fix is to change the
error recovery path to check the controller state and only
invoke the teardown path if not already in the CONNECTING state.

Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 863fbae9 14-Nov-2019 James Smart <jsmart2021@gmail.com>

nvme_fc: add module to ops template to allow module references

In nvme-fc: it's possible to have connected active controllers
and as no references are taken on the LLDD, the LLDD can be
unloaded. The controller would enter a reconnect state and as
long as the LLDD resumed within the reconnect timeout, the
controller would resume. But if a namespace on the controller
is the root device, allowing the driver to unload can be problematic.
To reload the driver, it may require new io to the boot device,
and as it's no longer connected we get into a catch-22 that
eventually fails, and the system locks up.

Fix this issue by taking a module reference for every connected
controller (which is what the core layer did to the transport
module). Reference is cleared when the controller is removed.

Acked-by: Himanshu Madhani <hmadhani@marvell.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# b1ae1a23 24-Nov-2019 Israel Rukshin <israelr@mellanox.com>

nvme-fc: Avoid preallocating big SGL for data

nvme_fc_create_io_queues() preallocates a big buffer for the IO SGL based
on SG_CHUNK_SIZE.

Modern DMA engines are often capable of dealing with very big segments so
the SG_CHUNK_SIZE is often too big. SG_CHUNK_SIZE results in a static 4KB
SGL allocation per command.

If a controller has lots of deep queues, preallocation for the sg list can
consume substantial amounts of memory. For nvme-fc, nr_hw_queues can be
128 and each queue's depth 128. This means the resulting preallocation
for the data SGL is 128*128*4K = 64MB per controller.

Switch to runtime allocation for SGL for lists longer than 2 entries. This
is the approach used by NVMe PCI so it should be reasonable for NVMeOF as
well. Runtime SGL allocation has always been the case for the legacy I/O
path so this is nothing new.

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>


# 16686f3a 13-Oct-2019 Max Gurtovoy <maxg@mellanox.com>

nvme: move common call to nvme_cleanup_cmd to core layer

nvme_cleanup_cmd should be called for each call to nvme_setup_cmd
(symmetrical functions). Move the call for nvme_cleanup_cmd to the common
core layer and call it during nvme_complete_rq for the good flow. For
error flow, each transport will call nvme_cleanup_cmd independently. Also
take care of a special case of path failure, where we call
nvme_complete_rq without doing nvme_setup_cmd.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# bcde5f0f 27-Sep-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: ensure association_id is cleared regardless of a Disconnect LS

Code today only clears the association_id if a Disconnect LS is transmit.

Remove ambiguity and unconditionally clear the association_id if the
association has been terminated.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7db39484 27-Sep-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: clarify error messages

Change wording on a couple of messages to clarify what happened.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 44fbf3bb 27-Sep-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: Set new cmd set indicator in nvme-fc cmnd iu

Set the new category field in the FC-NVME CMND_IU based on queue number.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 53b2b2f5 27-Sep-2019 James Smart <jsmart2021@gmail.com>

nvme-fc and nvmet-fc: sync with FC-NVME-2 header changes

Sync sources with revised structure and field names to correspond with
FC-NVME-2 header sync-up.

Tested interoperability with success:
- prior initiator with new target
- prior target with new initiator
- new on new

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 74bd8cbe 06-Aug-2019 James Smart <james.smart@broadcom.com>

nvme-fc: Fail transport errors with NVME_SC_HOST_PATH

NVME_SC_INTERNAL should indicate an internal controller errors
and not host transport errors. These errors will propagate to
upper layers (essentially nvme core) and be interpereted as
transport errors which should not be taken into account for
namespace state or condition.

Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# f15872c5 28-Aug-2019 Israel Rukshin <israelr@mellanox.com>

nvme-fc: Use rq_dma_dir macro

Remove code duplication.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# e7832cb4 02-Aug-2019 Sagi Grimberg <sagi@grimberg.me>

nvme: make fabrics command run on a separate request queue

We have a fundamental issue that fabric commands use the admin_q.
The reason is, that admin-connect, register reads and writes and
admin commands cannot be guaranteed ordering while we are running
controller resets.

For example, when we reset a controller we perform:
1. disable the controller
2. teardown the admin queue
3. re-establish the admin queue
4. enable the controller

In order to perform (3), we need to unquiesce the admin queue, however
we may have some admin commands that are already pending on the
quiesced admin_q and will immediate execute when we unquiesce it before
we execute (4). The host must not send admin commands to the controller
before enabling the controller.

To fix this, we have the fabric commands (admin connect and property
get/set, but not I/O queue connect) use a separate fabrics_q and make
sure to quiesce the admin_q before we disable the controller, and
unquiesce it only after we enable the controller.

This fixes the error prints from nvmet in a controller reset storm test:
kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
Which indicate that the host is sending an admin command when the
controller is not enabled.

Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# c0f2f45b 22-Jul-2019 Sagi Grimberg <sagi@grimberg.me>

nvme: move sqsize setting to the core

nvme_enable_ctrl reads the cap register right after, so
no need to do that locally in the transport driver. Have
sqsize setting in nvme_init_identify.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 622b8b68 23-Jul-2019 Ming Lei <ming.lei@redhat.com>

nvme: wait until all completed request's complete fn is called

When aborting in-flight request for recovering controller, we have
to make sure that queue's complete function is called on completed
request before moving on. Otherwise, for example, the warning of
WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be
triggered on nvme-rdma.

Fix this issue by using blk_mq_tagset_wait_completed_request.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4c73cbdf 28-Jun-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: fix module unloads while lports still pending

Current code allows the module to be unloaded even if there are
pending data structures, such as localports and controllers on
the localports, that have yet to hit their reference counting
to remove them.

Fix by having exit entrypoint explicitly delete every controller,
which in turn will remove references on the remoteports and localports
causing them to be deleted as well. The exit entrypoint, after
initiating the deletes, will wait for the last localport to be deleted
before continuing.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4bea364f 29-May-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: add message when creating new association

When looking at console messages to troubleshoot, there are one
maybe two messages before creation of the controller is complete.
However, a lot of io takes place to reach that point. It's unclear
when things have started.

Add a message when the controller is attempting to create a new
association. Thus we know what controller, between what host and
remote port, and what NQN is being put into place for any
subsequent success or failure messages.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Giridhar Malavali <gmalavali@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4635873c 28-Apr-2019 Ming Lei <ming.lei@redhat.com>

scsi: lib/sg_pool.c: improve APIs for allocating sg pool

sg_alloc_table_chained() currently allows the caller to provide one
preallocated SGL and returns if the requested number isn't bigger than
size of that SGL. This is used to inline an SGL for an IO request.

However, scattergather code only allows that size of the 1st preallocated
SGL to be SG_CHUNK_SIZE(128). This means a substantial amount of memory
(4KB) is claimed for the SGL for each IO request. If the I/O is small, it
would be prudent to allocate a smaller SGL.

Introduce an extra parameter to sg_alloc_table_chained() and
sg_free_table_chained() for specifying size of the preallocated SGL.

Both __sg_free_table() and __sg_alloc_table() assume that each SGL has the
same size except for the last one. Change the code to allow both functions
to accept a variable size for the 1st preallocated SGL.

[mkp: attempted to clarify commit desc]

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: netdev@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 8730c1dd 03-May-2019 Hannes Reinecke <hare@suse.de>

nvme-fc: use separate work queue to avoid warning

When tearing down a controller the following warning is issued:

WARNING: CPU: 0 PID: 30681 at ../kernel/workqueue.c:2418 check_flush_dependency

This happens as the err_work workqueue item is scheduled on the
system workqueue (which has WQ_MEM_RECLAIM not set), but is flushed
from a workqueue which has WQ_MEM_RECLAIM set.

Fix this by providing an FC-NVMe specific workqueue.

Fixes: 4cff280a5fcc ("nvme-fc: resolve io failures during connect")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# a6a6d058 10-Apr-2019 Hannes Reinecke <hare@suse.de>

scsi: scsi_transport_fc: nvme: display FC-NVMe port roles

Currently the FC-NVMe driver is leverating the SCSI FC transport class to
access the remote ports. Which means that all FC-NVMe remote ports will be
visible to the fc transport layer, but due to missing definitions the port
roles will always be 'unknown'. This patch adds the missing definitions to
the fc transport class to that the port roles are correctly displayed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Giridhar Malavali <gmalavali@marvell.com>
Reviewed-by: Himanshu Madhani <hmadhani@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 67f471b6 08-Apr-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: correct csn initialization and increments on error

This patch fixes a long-standing bug that initialized the FC-NVME
cmnd iu CSN value to 1. Early FC-NVME specs had the connection starting
with CSN=1. By the time the spec reached approval, the language had
changed to state a connection should start with CSN=0. This patch
corrects the initialization value for FC-NVME connections.

Additionally, in reviewing the transport, the CSN value is assigned to
the new IU early in the start routine. It's possible that a later dma
map request may fail, causing the command to never be sent to the
controller. Change the location of the assignment so that it is
immediately prior to calling the lldd. Add a comment block to explain
the impacts if the lldd were to additionally fail sending the command.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 834d3710 13-Mar-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: reject reconnect if io queue count is reduced to zero

If:

- A successful connect has occurred with an io queue count greater than
zero and namespaces detected and running.
- An error or something occurs which causes a termination of the prior
association and then starts a reconnect,
- The reconnect then creates a new controller, but for whatever reason,
nvme_set_queue_count() results in io queue count set to zero. This
will skip io queue and tag set changes.
- But... the controller will transition to live, calling
nvme_start_ctrl, which calls nvme_start_queues(), which then releases
I/Os into the transport which then sends them to the driver.

As there are no queues, things eventually hit the driver looking for a
handle, which was cleared when the original controller was reset, and it
can't proceed. Worst case, things progress, but everything fails.

In the failing scenario, the nvme_set_features(NVME_FEAT_NUM_QUEUES)
command actually failed with a NVME_SC_INTERNAL error. For some reason,
although nvme_set_queue_count() saw the error and set io queue count to
zero, it doesn't return a failure status to the transport, which allows
the transport to continue using the controller.

Fix the problem by simply rejecting the new association if at least 1
I/O queue can't be created. The association reject will fail the
reconnect attempt and fall into the reconnect retry policy.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 06f3d71e 13-Mar-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: fix numa_node when dev is null

A recent change added a numa_node field to the nvme controller
and has the transport assign the node using dev_to_node().
However, fcloop registers with a NULL device struct, so the
dev_to_node() call oops.

Revise the assignment to assign no node when device struct is null.

Fixes: 103e515efa89b ("nvme: add a numa_node field to struct nvme_ctrl")
Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
[hch: small coding style fixup]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9f7d8ae2 13-Mar-2019 James Smart <jsmart2021@gmail.com>

nvme-fc: use nr_phys_segments to determine existence of sgl

For some nvme command, when issued by the nvme core layer, there
is an internal buffer which can cause blk_rq_payload_bytes() to
return a non-zero value yet there is no actual/real command payload
and sg list. An example is the WRITE ZEROES command.

To address this, when making choices on whether to dma map an sgl,
use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes().
When there is a sgl, blk_rq_payload_bytes() will return the amount
of data to be transferred by the sgl.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8638b246 18-Feb-2019 Christoph Hellwig <hch@lst.de>

nvme-fc: convert to SPDX identifiers

Update license to use SPDX-License-Identifier instead of verbose license
text.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


# 26c68227 14-Dec-2018 Sagi Grimberg <sagi@grimberg.me>

nvme-fabrics: allow nvmf_connect_io_queue to poll

Preparation for polling support for fabrics. Polling support
means that our completion queues are not generating any interrupts
which means we need to poll for the nvmf io queue connect as well.

Reviewed by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 103e515e 16-Nov-2018 Hannes Reinecke <hare@suse.com>

nvme: add a numa_node field to struct nvme_ctrl

Instead of directly poking into the struct device add a new numa_node
field to struct nvme_ctrl. This allows fabrics drivers where ctrl->dev
is a virtual device to support NUMA affinity as well.

Also expose the field as a sysfs attribute, and populate it for the
RDMA and FC transports.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# dfa74422 25-Nov-2018 Ewan D. Milne <emilne@redhat.com>

nvme-fc: initialize nvme_req(rq)->ctrl after calling __nvme_fc_init_request()

__nvme_fc_init_request() invokes memset() on the nvme_fcp_op_w_sgl structure, which
NULLed-out the nvme_req(req)->ctrl field previously set by nvme_fc_init_request().
This apparently was not referenced until commit faf4a44fff ("nvme: support traffic
based keep-alive") which now results in a crash in nvme_complete_rq():

[ 8386.897130] RIP: 0010:panic+0x220/0x26c
[ 8386.901406] Code: 83 3d 6f ee 72 01 00 74 05 e8 e8 54 02 00 48 c7 c6 40 fd 5b b4 48 c7 c7 d8 8d c6 b3 31e
[ 8386.922359] RSP: 0018:ffff99650019fc40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 8386.930804] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[ 8386.938764] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8e325f8168b0
[ 8386.946725] RBP: ffff99650019fcb0 R08: 0000000000000000 R09: 00000000000004f8
[ 8386.954687] R10: 0000000000000000 R11: ffff99650019f9b8 R12: ffffffffb3c55f3c
[ 8386.962648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[ 8386.970613] oops_end+0xd1/0xe0
[ 8386.974116] no_context+0x1b2/0x3c0
[ 8386.978006] do_page_fault+0x32/0x140
[ 8386.982090] page_fault+0x1e/0x30
[ 8386.985786] RIP: 0010:nvme_complete_rq+0x65/0x1d0 [nvme_core]
[ 8386.992195] Code: 41 bc 03 00 00 00 74 16 0f 86 c3 00 00 00 66 3d 83 00 41 bc 06 00 00 00 0f 85 e7 00 000
[ 8387.013147] RSP: 0018:ffff99650019fe18 EFLAGS: 00010246
[ 8387.018973] RAX: 0000000000000000 RBX: ffff8e322ae51280 RCX: 0000000000000001
[ 8387.026935] RDX: 0000000000000400 RSI: 0000000000000001 RDI: ffff8e322ae51280
[ 8387.034897] RBP: ffff8e322ae51280 R08: 0000000000000000 R09: ffffffffb2f0b890
[ 8387.042859] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[ 8387.050821] R13: 0000000000000100 R14: 0000000000000004 R15: ffff8e2b0446d990
[ 8387.058782] ? swiotlb_unmap_page+0x40/0x40
[ 8387.063448] nvme_fc_complete_rq+0x2d/0x70 [nvme_fc]
[ 8387.068986] blk_done_softirq+0xa1/0xd0
[ 8387.073264] __do_softirq+0xd6/0x2a9
[ 8387.077251] run_ksoftirqd+0x26/0x40
[ 8387.081238] smpboot_thread_fn+0x10e/0x160
[ 8387.085807] kthread+0xf8/0x130
[ 8387.089309] ? sort_range+0x20/0x20
[ 8387.093198] ? kthread_stop+0x110/0x110
[ 8387.097475] ret_from_fork+0x35/0x40
[ 8387.101462] ---[ end trace 7106b0adf5e422f8 ]---

Fixes: faf4a44fff ("nvme: support traffic based keep-alive")
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 92f806d6 19-Nov-2018 Jens Axboe <axboe@kernel.dk>

nvme-fc: remove ->poll implementation

It's specifically looking for a given request, which we will not be
supporting going forward. Also kill the qla2xxx poll implementation
as that's the only user of the nvme-fc poll, and the now unused
->poll_queue() hook.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4cff280a 14-Nov-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: resolve io failures during connect

If an io error occurs on an io issued while connecting, recovery
of the io falls flat as the state checking ends up nooping the error
handler.

Create an err_work work item that is scheduled upon an io error while
connecting. The work thread terminates all io on all queues and marks
the queues as not connected. The termination of the io will return
back to the callee, which will then back out of the connection attempt
and will reschedule, if possible, the connection attempt.

The changes:
- in case there are several commands hitting the error handler, a
state flag is kept so that the error work is only scheduled once,
on the first error. The subsequent errors can be ignored.
- The calling sequence to stop keep alive and terminate the queues
and their io is lifted from the reset routine. Made a small
service routine used by both reset and err_work.
- During debugging, found that the teardown path can reference
an uninitialized pointer, resulting in a NULL pointer oops.
The aen_ops weren't initialized yet. Add validation on their
initialization before calling the teardown routine.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 7baa8572 08-Nov-2018 Jens Axboe <axboe@kernel.dk>

blk-mq-tag: change busy_iter_fn to return whether to continue or not

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.

Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d19b8bc8 27-Oct-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: fix request private initialization

The patch made to avoid Coverity reporting of out of bounds access
on aen_op moved the assignment of a pointer, leaving it null when it
was subsequently used to calculate a private pointer. Thus the private
pointer was bad.

Move/correct the private pointer initialization to be in sync with the
patch.

Fixes: 0d2bdf9f4134 ("nvme-fc: rework the request initialization code")
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0d2bdf9f 08-Oct-2018 Bart Van Assche <bvanassche@acm.org>

nvme-fc: rework the request initialization code

Instead of setting and then clearing the first_sgl pointer for AEN requests,
leave that pointer zero. This patch does not change how requests are
initialized but avoids that Coverity reports the following complaint for
nvme_fc_init_aen_ops():

CID 1418400 (#1 of 1): Out-of-bounds access (OVERRUN)
4. overrun-buffer-val: Overrunning buffer pointed to by aen_op of 312 bytes by passing it to a function which accesses it at byte offset 312.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d3d0bc78 08-Oct-2018 Bart Van Assche <bvanassche@acm.org>

nvme-fc: introduce struct nvme_fcp_op_w_sgl

This patch does not change any functionality but makes the intent of the
code more clear.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 76c910c7 08-Oct-2018 Bart Van Assche <bvanassche@acm.org>

nvme-fc: fix kernel-doc headers

This patch avoids that the kernel-doc tool complains about several
multiple function headers when building with W=1.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 97faec53 13-Sep-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device

The fc transport device should allow for a rediscovery, as userspace
might have lost the events. Example is udev events not handled during
system startup.

This patch add a sysfs entry 'nvme_discovery' on the fc class to
have it replay all udev discovery events for all local port/remote
port address pairs.

Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d4e4230c 10-Aug-2018 Milan P. Gandhi <mgandhi@redhat.com>

nvme-fc: fix for a minor typos

Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6cdefc6e 20-Jul-2018 James Smart <jsmart2021@gmail.com>

nvme: if_ready checks to fail io to deleting controller

The revised if_ready checks skipped over the case of returning error when
the controller is being deleted. Instead it was returning BUSY, which
caused the ios to retry, which caused the ns delete to hang waiting for
the ios to drain.

Stack trace of hang looks like:
kworker/u64:2 D 0 74 2 0x80000000
Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
Call Trace:
? __schedule+0x26d/0x820
schedule+0x32/0x80
blk_mq_freeze_queue_wait+0x36/0x80
? remove_wait_queue+0x60/0x60
blk_cleanup_queue+0x72/0x160
nvme_ns_remove+0x106/0x140 [nvme_core]
nvme_remove_namespaces+0x7e/0xa0 [nvme_core]
nvme_delete_ctrl_work+0x4d/0x80 [nvme_core]
process_one_work+0x160/0x350
worker_thread+0x1c3/0x3d0
kthread+0xf5/0x130
? process_one_work+0x350/0x350
? kthread_bind+0x10/0x10
ret_from_fork+0x1f/0x30

Extend nvmf_fail_nonready_command() to supply the controller pointer so
that the controller state can be looked at. Fail any io to a controller
that is deleting.

Fixes: 3bc32bb1186c ("nvme-fabrics: refactor queue ready check")
Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready")
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>


# 59e29ce6 29-Jun-2018 Sagi Grimberg <sagi@grimberg.me>

nvme: cache struct nvme_ctrl reference to struct nvme_request

We will need to reference the controller in the setup and completion
time for tracing and future traffic based keep alive support.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 02d62a8b 20-Jun-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: release io queues to allow fast fail

Rather than leaving io queues quiesced after tearing down an association,
restart them. This allows ios to be replayed, with fastfail ios terminating
and non-fastfail getting into loops of retry.

This follows rdma's lead.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimber.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 3bc32bb1 11-Jun-2018 Christoph Hellwig <hch@lst.de>

nvme-fabrics: refactor queue ready check

Move the is_connected check to the fibre channel transport, as it has no
meaning for other transports. To facilitate this split out a new
nvmf_fail_nonready_command helper that is called by the transport when
it is asked to handle a command on a queue that is not ready.

Also avoid a function call for the queue live fast path by inlining
the check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>


# 3e493c00 13-Jun-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: fix nulling of queue data on reconnect

The reconnect path is calling the init routines to clear a queue
structure. But the queue structure has state that perhaps needs
to persist as long as the controller is live.

Remove the nvme_fc_init_queue() calls on reconnect.
The nvme_fc_free_queue() calls will clear state bits and reset
any relevant queue state for a new connection.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 587331f7 13-Jun-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: remove reinit_request routine

The reinit_request routine is not necessary. Remove support for the
op callback.

As all that nvme_reinit_tagset() does is itterate and call the
reinit routine, it too has no purpose. Remove the call.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4c984154 13-Jun-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: change controllers first connect to use reconnect path

Current code follows the framework that has been in the transports
from the beginning where initial link-side controller connect occurs
as part of "creating the controller". Thus that first connect fully
talks to the controller and obtains values that can then be used in
for blk-mq setup, etc. It also means that everything about the
controller is fully know before the "create controller" call returns.

This has several weaknesses:
- The initial create_ctrl call made by the cli will block for a long
time as wire transactions are performed synchronously. This delay
becomes longer if errors occur or connectivity is lost and retries
need to be performed.
- Code wise, it means there is a separate connect path for initial
controller connect vs the (same) steps used in the reconnect path.
- And as there's separate paths, it means there's separate error
handling and retry logic. It also plays havoc with the NEW state
(should transition out of it after successful initial connect) vs
the RESETTING and CONNECTING (reconnect) states that want to be
transitioned to on error.
- As there's separate paths, to recover from errors and disruptions,
it requires separate recovery/retry paths as well and can severely
convolute the controller state.

This patch reworks the fc transport to use the same connect paths
for the initial connection as it uses for reconnect. This makes a
single path for error recovery and handling.

This patch:
- Removes the driving of the initial connect and replaces it with
a state transition to CONNECTING and initiating the reconnect
thread. A dummy state transition of RESETTING had to be traversed
as a direct transtion of NEW->CONNECTING is not allowed. Given
that the controller is "new", the RESETTING transition is a simple
no-op. Once in the reconnecting thread, the normal behaviors of
ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
apply before the controller is torn down.
- Only if the state transitions couldn't be traversed and the
reconnect thread not scheduled, will the controller be torn down
while in create_ctrl.
- The prior code used the controller state of NEW to indicate
whether request queues had been initialized or not. For the admin
queue, the request queue is always created, so there's no need to
check a state. For IO queues, change to tracking whether a successful
io request queue create has occurred (e.g. 1st successful connect).
- The initial controller id is initialized to the dynamic controller
id used in the initial connect message. It will be overwritten by
the real controller id once the controller is connected on the wire.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d250bf4e 30-May-2018 Christoph Hellwig <hch@lst.de>

blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter

We already check for started commands in all callbacks, but we should
also protect against already completed commands. Do this by taking
the checks to common code.

Acked-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 90fcaf5d 11-May-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: remove setting DNR on exception conditions

Current code will set DNR if the controller is deleting or there is
an error during controller init. None of this is necessary.

Remove the code that sets DNR

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4fb135ad 19-Apr-2018 Johannes Thumshirn <jthumshirn@suse.de>

nvme: fc: provide a descriptive error

Provide a descriptive error in case an lport to rport association
isn't found when creating the FC-NVME controller.

Currently it's very hard to debug the reason for a failed connect
attempt without a look at the source.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>


# bb06ec31 12-Apr-2018 James Smart <jsmart2021@gmail.com>

nvme: expand nvmf_check_if_ready checks

The nvmf_check_if_ready() checks that were added are very simplistic.
As such, the routine allows a lot of cases to fail ios during windows
of reset or re-connection. In cases where there are not multi-path
options present, the error goes back to the callee - the filesystem
or application. Not good.

The common routine was rewritten and calling syntax slightly expanded
so that per-transport is_ready routines don't need to be present.
The transports now call the routine directly. The routine is now a
fabrics routine rather than an inline function.

The routine now looks at controller state to decide the action to
take. Some states mandate io failure. Others define the condition where
a command can be accepted. When the decision is unclear, a generic
queue-or-reject check is made to look for failfast or multipath ios and
only fails the io if it is so marked. Otherwise, the io will be queued
and wait for the controller state to resolve.

Admin commands issued via ioctl share a live admin queue with commands
from the transport for controller init. The ioctls could be intermixed
with the initialization commands. It's possible for the ioctl cmd to
be issued prior to the controller being enabled. To block this, the
ioctl admin commands need to be distinguished from admin commands used
for controller init. Added a USERCMD nvme_req(req)->rq_flags bit to
reflect this division and set it on ioctls requests. As the
nvmf_check_if_ready() routine is called prior to nvme_setup_cmd(),
ensure that commands allocated by the ioctl path (actually anything
in core.c) preps the nvme_req(req) before starting the io. This will
preserve the USERCMD flag during execution and/or retry.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0cdd5fca 05-Mar-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: on remoteport reuse, set new nport_id and role.

When reattaching to a removed remoteport that has not yet been
fully deleted as it's waiting for reconnect timeouts, be sure to
re-set the ports nport id and role.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b12740d3 28-Feb-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: fix abort race on teardown with lld reject

Another abort race: An io request is started, becomes active,
and is attempted to be started with the lldd. At the same time
the controller is stopped/torndown and an itterator is run to
abort the ios. As the io is active, it is added to the outstanding
aborted io count. However on the original io request thread, the
driver ends up rejecting the io due to the condition that induced
the controller teardown. The driver reject path didn't check whether
it was in the outstanding io count. This left the count outstanding
stopping controller teardown.

Correct by, in the driver reject case, setting the state to
inactive and checking whether it was in the outstanding io count.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 041018c6 12-Mar-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: io timeout should defer abort to ctrl reset

The current nvme_fc code, when an io times out, will abort the io
on the fc link, then call the error recovery routine to reset the
controller. It is during the reset of the controller that the
transport will wait for all ios to be aborted before sending a
Disconnect LS to the target.

However, the reset routine only waits for the io which it generates
the abort for to complete. Any io that was aborted just prior to the
reset isn't in it's list to wait for. Thus the Disconnect is getting
sent before the aborts have completed.

Correct by removing the abort in the timeout handler. The reset will
generate the abort. At that point the timeout handler can be simplified
to request the reset (via the error handler) and restart the timeout
timer.

Also fixes a small typo in a comment in the reset handler.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# cf25809b 13-Mar-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: fix ctrl create failures racing with workq items

If there are errors during initial controller create, the transport
will teardown the partially initialized controller struct and free
the ctlr memory. Trouble is - most of those errors can occur due
to asynchronous events happening such io timeouts and subsystem
connectivity failures. Those failures invoke async workq items to
reset the controller and attempt reconnect. Those may be in progress
as the main thread frees the ctrl memory, resulting in NULL ptr oops.

Prevent this from happening by having the main ctrl failure thread
changing state to DELETING followed by synchronously cancelling any
pending queued work item. The change of state will prevent the
scheduling of resets or reconnect events.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 77d0612d 11-Mar-2018 Max Gurtovoy <maxg@mellanox.com>

nvme: centralize ctrl removal prints

nvme_delete_ctrl can be called from various contexts in parallel,
and cause duplicated information prints, even though the specific
context doesn't perform the actual removal. Instead, print the
information when the actual removal occurs.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d157e534 07-Mar-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: rework sqsize handling

Corrected four outstanding issues in the transport around sqsize.

1: Create Connection LS is sending the 1's-based sqsize, should be
sending the 0's-based value.

2: allocation of hw queue is using the 0's-base size. It should be
using the 1's-based value.

3: normalization of ctrl.sqsize by MQES is using MQES+1 (1's-based
value). It should be MQES (0's-based value).

4: Missing clause to ensure queue_count not larger than ctrl->sqsize.

Corrected by:
Clean up routines that pass queue size around. The queue size value is
the actual count (1's-based) value and determined from ctrl->sqsize + 1.

Routines that send 0's-based value adapt from queue size.

Sset ctrl->sqsize properly for MQES.

Added clause to nsure queue_count not larger than ctrl->sqsize + 1.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>


# c3aedd22 06-Feb-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: cleanup io completion

There was some old cold that dealt with complete_rq being called
prior to the lldd returning the io completion. This is garbage code.
The complete_rq routine was being called after eh_timeouts were
called and it was due to eh_timeouts not being handled properly.
The timeouts were fixed in prior patches so that in general, a
timeout will initiate an abort and the reset timer restarted as
the abort operation will take care of completing things. Given the
reset timer restarted, the erroneous complete_rq calls were eliminated.

So remove the work that was synchronizing complete_rq with io
completion.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 3efd6e8e 06-Feb-2018 James Smart <jsmart2021@gmail.com>

nvme_fc: correct abort race condition on resets

During reset handling, there is live io completing while the reset
is taking place. The reset path attempts to abort all outstanding io,
counting the number of ios that were reset. It then waits for those
ios to be reclaimed from the lldd before continuing.

The transport's logic on io state and flag setting was poor, allowing
ios to complete simultaneous to the abort request. The completed ios
were counted, but as the completion had already occurred, the
completion never reduced the count. As the count never zeros, the
reset/delete never completes.

Tighten it up by unconditionally changing the op state to completed
when the io done handler is called. The reset/abort path now changes
the op state to aborted, but the abort only continues if the op
state was live priviously. If complete, the abort is backed out.
Thus proper counting of io aborts and their completions is working
again.

Also removed the TERMIO state on the op as it's redundant with the
op's aborted state.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# ad6a0a52 31-Jan-2018 Max Gurtovoy <maxg@mellanox.com>

nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING

In pci transport, this state is used to mark the initialization
process. This should be also used in other transports as well.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 86ff7c2a 30-Jan-2018 Ming Lei <ming.lei@redhat.com>

blk-mq: introduce BLK_STS_DEV_RESOURCE

This status is returned from driver to block layer if device related
resource is unavailable, but driver can guarantee that IO dispatch
will be triggered in future when the resource is available.

Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver
returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is
3 ms because both scsi-mq and nvmefc are using that magic value.

If a driver can make sure there is in-flight IO, it is safe to return
BLK_STS_DEV_RESOURCE because:

1) If all in-flight IOs complete before examining SCHED_RESTART in
blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
is run immediately in this case by blk_mq_dispatch_rq_list();

2) if there is any in-flight IO after/when examining SCHED_RESTART
in blk_mq_dispatch_rq_list():
- if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
- otherwise, this request will be dispatched after any in-flight IO is
completed via blk_mq_sched_restart()

3) if SCHED_RESTART is set concurently in context because of
BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
cases and make sure IO hang can be avoided.

One invariant is that queue will be rerun if SCHED_RESTART is set.

Suggested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0fd997d3 11-Jan-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: correct hang in nvme_ns_remove()

When connectivity is lost to a device, the association is terminated
and the blk-mq queues are quiesced/stopped. When connectivity is
re-established, they are resumed.

If connectivity is lost for a sufficient amount of time that the
controller is then deleted, the delete path starts tearing down queues,
and eventually calling nvme_ns_remove(). It appears that pending
commands may cause blk_cleanup_queue() to never complete and the
teardown stalls.

Correct by starting the ns queues after transitioning to a DELETING
state, allowing pending commands to be flushed with io failures. Thus
the delete path is clear when reached.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d625d05e 11-Jan-2018 James Smart <jsmart2021@gmail.com>

nvme-fc: fix rogue admin cmds stalling teardown

When connectivity is lost to a device, the association is terminated
and the blk-mq queues are quiesced/stopped. When connectivity is
re-established, they are resumed.

If an admin command is received while connectivity is list, the ioctl
queues the command on the admin_q and the command stalls (the thread
issuing the ioctl hangs/waits). if the connectivity is lost long
enough such that the controller is then deleted, the delete code
makes its calls to initiate the delete, which then expects the core
layer to call the transport when all references are removed and the
controller can be freed. Unfortunately, nothing in this path dequeued
the admin command, so a reference sits outstanding and things stop,
hanging the delete indefinitely.

Correct by unquiescing the admin queue in the delete association. This
means any admin command (which should only be from an ioctl) issued
after connectivity is lost will detect the controller is in a
reconnecting state and will (fast) fail the command. Thus, a pending
reference can no longer be created. Once connectivity is re-established,
a new ioctl/admin command would see proper device state and function again.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 0de5cd36 25-Dec-2017 Roy Shterman <roys@lightbitslabs.com>

nvme-fabrics: protect against module unload during create_ctrl

NVMe transport driver module unload may (and usually does) trigger
iteration over the active controllers and delete them all (sometimes
under a mutex). However, a controller can be created concurrently with
module unload which can lead to leakage of resources (most important char
device node leakage) in case the controller creation occured after the
unload delete and drain sequence. To protect against this, we take a
module reference to guarantee that the nvme transport driver is not
unloaded while creating a controller.

Signed-off-by: Roy Shterman <roys@lightbitslabs.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4596e752 29-Nov-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: remove double put reference if admin connect fails

There are two put references in the failure case of initial
create_association. The first put actually frees the controller, thus the
second put references freed memory.

Remove the unnecessary 2nd put.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 26c0a26d 24-Nov-2017 Jens Axboe <axboe@kernel.dk>

nvme-fc: don't use bit masks for set/test_bit() numbers

So far harmless, but it's confusing and a bug waiting to happen if the
shifts grow larger than 4.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9e0ed16a 24-Oct-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: check if queue is ready in queue_rq

In case the queue is not LIVE (fully functional and connected at the nvmf
level), we cannot allow any commands other than connect to pass through.

Add a new queue state flag NVME_FC_Q_LIVE which is set after nvmf connect
and cleared in queue teardown.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ad22c355 07-Nov-2017 Keith Busch <kbusch@kernel.org>

nvme: remove handling of multiple AEN requests

The driver can handle tracking only one AEN request, so this patch
removes handling for multiple ones.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 08e15075 07-Nov-2017 Keith Busch <kbusch@kernel.org>

nvme-fc: remove unused "queue_size" field

This was being saved in a structure, but never used anywhere. The queue
size is obtained through other means, so there's no reason to duplicate
this without a user for it.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 38dabe21 07-Nov-2017 Keith Busch <kbusch@kernel.org>

nvme: centralize AEN defines

All the transports were unnecessarilly duplicating the AEN request
accounting. This patch defines everything in one place.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 158bfb88 03-Nov-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: decouple ns references from lldd references

In the lldd api, a lldd may unregister a remoteport (loss of connectivity
or driver unload) or localport (driver unload). The lldd must wait for the
remoteport_delete or localport_delete before completing its actions post
the unregister. The xxx_deletes currently occur only when the xxxport
structure is fully freed after all references are removed. Thus the lldd
may be held hostage until an app or in-kernel entity that has a namespace
open finally closes so the namespace can be removed, the controller
removed, thus the transport objects, thus the lldd.

This patch decouples the transport and os-facing objects from the lldd
and the remoteport and localport. There is a point in all deletions
where the transport will no longer interact with the lldd on behalf of
a controller. That point centers around the association established
with the target/subsystem. It will access the lldd whenever it attempts
to create an association and while the association is active. New
associations may only be created if the remoteport is live (thus the
localport is live). It will not access the lldd after deleting the
association.

Therefore, the patch tracks the count of active controllers - those with
associations being created or that are active - on a remoteport. It also
tracks the number of remoteports that have active controllers, on a
a localport. When a remoteport is unregistered, as soon as there are no
active controllers, the lldd's remoteport_delete may be called and the
lldd may continue. Similarly, when a localport is unregistered, as soon
as there are no remoteports with active controllers, the localport_delete
callback may be made. This significantly speeds up unregistration with
the lldd.

The transport objects continue in suspended status with reconnect timers
running, and upon expiration, normal ref-counting will occur and the
objects will be freed. The transport object may still be held hostage
by the application/kernel module, but that is acceptable.

With this change, the lldd may be fully unloaded and reloaded, and
if registrations occur prior to the timeouts, the nvme controller and
namespaces will resume normally as if a link bounce.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c5760f30 03-Nov-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: fix localport resume using stale values

The localport resume was not updating the lldd ops structure. If the
lldd is unloaded and reloaded, the ops pointers will differ.

Additionally, as there are device references taken by the localport,
ensure that resume only resumes if the device matches as well.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2b632970 25-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: add dev_loss_tmo timeout and remoteport resume support

When a remoteport is unregistered (connectivity lost), the following
actions are taken:

- the remoteport is marked DELETED
- the time when dev_loss_tmo would expire is set in the remoteport
- all controllers on the remoteport are reset.

After a controller resets, it will stall in a RECONNECTING state waiting
for one of the following:

- the controller will continue to attempt reconnect per max_retries and
reconnect_delay. As no remoteport connectivity, the reconnect attempt
will immediately fail. If max reconnects has not been reached, a new
reconnect_delay timer will be schedule. If the current time plus
another reconnect_delay exceeds when dev_loss_tmo expires on the remote
port, then the reconnect_delay will be shortend to schedule no later
than when dev_loss_tmo expires. If max reconnect attempts are reached
(e.g. ctrl_loss_tmo reached) or dev_loss_tmo ix exceeded without
connectivity, the controller is deleted.
- the remoteport is re-registered prior to dev_loss_tmo expiring.
The resume of the remoteport will immediately attempt to reconnect
each of its suspended controllers.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: updated to use nvme_delete_ctrl]
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 96e24801 25-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: check connectivity before initiating reconnects

Check remoteport connectivity before initiating reconnects

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ac7fe82b 25-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: add a dev_loss_tmo field to the remoteport

Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device
connectivity loss.

The transport initializes the value in the nvme_fc_register_remoteport()
call. If the value is not set, a default of 60s is set.

Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine,
which allows the lldd to dynamically update the value on an existing
remoteport.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 44c6ec77 25-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: change ctlr state assignments during reset/reconnect

Clean up some of the controller state checks and add the
RESETTING->RECONNECTING state transition.

Specifically:
- the movement of the RESETTING state change and schedule of reset_work
to core doesn't work wiht nvme_fc_error_recovery setting state to
RECONNECTING before attempting to reset. Remove the state change as
the reset request does it.
- In the rare cases where an error occurs right as we're transitioning
to LIVE, defer the controller start actions.
- In error handling on teardown of associations while performing initial
controller creation - avoid quiesce calls on the admin_q. They are
unneeded.
- Add the RESETTING->RECONNECTING transition in the reset handler.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 4054637c 29-Oct-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: flush reset_work before safely continuing with delete operation

Prevent racing controller reset and delete flows. reset_work must not
ever self-requeue so flushing it suffices.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 6cd53d14 29-Oct-2017 Christoph Hellwig <hch@lst.de>

nvme: consolidate common code from ->reset_work

No change in behavior except that the FC code cancels two work items a
little later now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# c5017e85 29-Oct-2017 Christoph Hellwig <hch@lst.de>

nvme: move controller deletion to common code

Move the ->delete_work and the associated helpers to common code instead
of duplicating them in every driver. This also adds the missing reference
get/put for the loop driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 29c09648 29-Oct-2017 Christoph Hellwig <hch@lst.de>

nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrl

No need to have two functions doing the same thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 71c691fd 28-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: avoid workqueue flush stalls

There's no need to wait for the full nvme_wq, which is now shared,
to flush. flush only the delete_work item.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sgi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# ecad0d2c 23-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: remove NVME_FC_MAX_SEGMENTS

The define is an arbitrary limit to the io size on the initiator,
capping the io to 1MB-4KB.

Remove the define from the transport. I/O size will solely be limited
by the LLDD sg limits.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 56d5f4f1 20-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: add support for duplicate_connect option

Adds support for the duplicate_connect option. When set to true,
checks whether there's an existing controller via the same host port
and target port for the same host (hostnqn, hostid) to the same
subsystem. Fails the connection request if an existing controller.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d22524a4 18-Oct-2017 Christoph Hellwig <hch@lst.de>

nvme: switch controller refcounting to use struct device

Instead of allocating a separate struct device for the character device
handle embedd it into struct nvme_ctrl and use it for the main controller
refcounting. This removes double refcounting and gets us an automatic
reference for the character device operations. We keep ctrl->device as a
pointer for now to avoid chaning printks all over, but in the future we
could look into message printing helpers that take a controller structure
similar to what other subsystems do.

Note the delete_ctrl operation always already has a reference (either
through sysfs due this change, or because every open file on the
/dev/nvme-fabrics node has a refernece) when it is entered now, so we
don't need to do the unless_zero variant there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>


# 134aedc9 19-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: correct io timeout behavior

The transport io timeout behavior wasn't quite correct. It ignored
that the io error handler is supposed to be synchronous so it possibly
allowed the blk request to be restarted while the io associated was
still aborting. Timeouts on reserved commands, those used for
association create, were never timing out thus they hung out forever.

To correct:
If an io is times out while a remoteport is not connected, just
restart the io timer. The lack of connectivity will simultaneously
be resetting the controller, so the reset path will abort and terminate
the io.

If an io is times out while it was marked for transport abort, just
reset the io timer. The abort process is underway and will complete
the io.

Otherwise, if an io times out, abort the io. If the abort was
unsuccessful (unlikely) give up and return not handled.

If the abort was successful, as the abort process is underway it will
terminate the io, so rather than synchronously waiting, just restart
the io timer.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 0a02e39f 19-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: correct io termination handling

The io completion handling for i/o's that are failing due to
to a transport error or association termination had issues, causing
io failures (DNR set so retries didn't kick in) or long stalls.

Change the io completion handler for the following items:

When an io has been completed due to a transport abort (based on an
exchange error) or when marked as aborted as part of an association
termination (FCOP_FLAGS_TERMIO), set the NVME completion status to
NVME_SC_ABORTED. By default, do not set DNR on the status so that a
retry can be attempted after association recreate.

In cases where an io is failed (non-successful nvme status including
aborted), if the controller is being deleted (blk_queue_dying) or
the io was part of the ios used for association creation (ctrl state
is NEW or RECONNECTING), then additionally set the DNR bit so the io
will not be retried. If the failed io was part of association creation,
the failure will tear down the partially completioned association and
typically restart a new reconnect attempt (another create association
later).

Rearranged code flow to remove a largely unneeded local variable.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5a22e2bf 17-Oct-2017 Israel Rukshin <israelr@mellanox.com>

nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag set

Since commit b86dd81
"block: get rid of blk-mq default scheduler choice Kconfig entries",
when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler.
This flag is useful for admin queues that aren't used for normal IO.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 17c4dc6e 09-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: retry initial controller connections 3 times

Currently, if a frame is lost of command fails as part of initial
association create for a new controller, the new controller connection
request will immediately fail.

Add in an immediate 3 retry loop before giving up.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8a82dbf1 09-Oct-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: fix iowait hang

Add missing iowait head initialization.
Fix irqsave vs irq: wait_event_lock_irq() doesn't do irq save/restore

Fixes: 36715cf4b366 ("nvme_fc: replace ioabort msleep loop with completion”)
Cc: <stable@vger.kernel.org> # 4.13
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Tested-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 31b84460 10-Oct-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: introduce nvme_reinit_tagset

Move blk_mq_reinit_tagset from blk-mq to nvme core
as the only user of it. Current transports that use
it (rdma, fc) simply implement .reinit_request op.

This patch does not change any functionality.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 469d0ef0 26-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: move remote port get/put/free location

move nvme_fc_rport_get/put and rport free to higher in the file to
avoid adding prototypes to resolve references in upcoming code additions

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5f568556 14-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: create fc class and transport device

Added a new fc class and a device node for udev events under it. I
expect the fc class will eventually be the location where the FC SCSI and
FC NVME merge in the future. Therefore names are kept somewhat generic.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# eaefd5ab 14-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: add uevent for auto-connect

To support auto-connecting to FC-NVME devices upon their dynamic
appearance, add a uevent that can kick off connection scripts.
uevent is posted against the fc_udev device.

patch set tested with the following rule to kick an nvme-cli connect-all
for the FC initiator and FC target ports. This is just an example for
testing and not intended for real life use.

ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'"

I will post proposed udev/systemd scripts for possible kernel support.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d9d34c0b 07-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: use transport-specific sgl format

Sync with NVM Express spec change and FC-NVME 1.18.

FC transport sets SGL type to Transport SGL Data Block Descriptor and
subtype to transport-specific value 0x0A.

Removed the warn-on's on the PRP fields. They are unneeded. They were
to check for values from the upper layer that weren't set right, and
for the most part were fine. But, with Async events, which reuse the
same structure and 2nd time issued the SGL overlay converted them to
the Transport SGL values - the warn-on's were errantly firing.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 56b7103a 07-Sep-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: remove use of FC-specific error codes

The FC-NVME transport used the FC-specific error codes in cases where
it had to fabricate an error to go back up stack. Instead of using the
FC-specific values, now use a generic value (NVME_SC_INTERNAL).

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5533d424 31-Jul-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: Reattach to localports on re-registration

If the LLDD resets or detaches from an fc port, the LLDD will
deregister all remoteports seen by the fc port and deregister the
localport associated with the fc port. The teardown of the localport
structure will be held off due to reference counting until all the
remoteports are removed (and they are held off until all
controllers/associations to terminated). Currently, if the fc port
is reinit/reattached and registered again as a localport it is
treated as an independent entity from the prior localport and all
prior remoteports and controllers cannot be revived. They are
created as new and separate entities.

This patch changes the localport registration to look at the known
localports that are waiting to be torndown. If they are the same port
based on wwn's, the local port is transitioned out of the teardown
state. This allows the remote ports and controller connections to
be reestablished and resumed as long as the localport can also be
reregistered within the timeout windows.

The patch adds a new routine nvme_fc_attach_to_unreg_lport() with
the functionality and moves the lport get/put routines to avoid
forward references.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 34b6c231 10-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: Add admin_tagset pointer to nvme_ctrl

Will be used when we centralize control flows.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d352ae20 17-Aug-2017 Bart Van Assche <bvanassche@acm.org>

blk-mq: Make blk_mq_reinit_tagset() calls easier to read

Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9c5358e1 17-Jul-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: revise TRADDR parsing

The FC-NVME spec hasn't locked down on the format string for TRADDR.
Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
where the wwn's are hex values but not prefixed by 0x.

Most implementations so far expect a string format of
"nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
uses the match_u64 parser which requires a leading 0x prefix to set
the base properly. If it's not there, a match will either fail or return
a base 10 value.

The resolution in T11 is pushing out. Therefore, to fix things now and
to cover any eventuality and any implementations already in the field,
this patch adds support for both formats.

The change consists of replacing the token matching routine with a
routine that validates the fixed string format, and then builds
a local copy of the hex name with a 0x prefix before calling
the system parser.

Note: the same parser routine exists in both the initiator and target
transports. Given this is about the only "shared" item, we chose to
replicate rather than create an interdendency on some shared code.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8b25f351 18-Jul-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: address target disconnect race conditions in fcp io submit

There are cases where threads are in the process of submitting new
io when the LLDD calls in to remove the remote port. In some cases,
the next io actually goes to the LLDD, who knows the remoteport isn't
present and rejects it. To properly recovery/restart these i/o's we
don't want to hard fail them, we want to treat them as temporary
resource errors in which a delayed retry will work.

Add a couple more checks on remoteport connectivity and commonize the
busy response handling when it's seen.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# d09f2b45 02-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: split nvme_uninit_ctrl into stop and uninit

Usually before we teardown the controller we want to:
1. complete/cancel any ctrl inflight works
2. remove ctrl namespaces (only for removal though, resets
shouldn't remove any namespaces).

but we do not want to destroy the controller device as
we might use it for logging during the teardown stage.

This patch adds nvme_start_ctrl() which queues inflight
controller works (aen, ns scan, queue start and keep-alive
if kato is set) and nvme_stop_ctrl() which cancels the works
namespace removal is left to the callers to handle.

Move nvme_uninit_ctrl after we are done with the
controller device.

Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# f9c5af5f 02-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues

unlike blk_mq_stop_hw_queues and blk_mq_start_stopped_hw_queues
quiescing/unquiescing respects the submission path rcu grace.

Also, make sure to unquiesce before cleanup the admin queue.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-By: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# e5859d3a 04-Jul-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: use blk_mq_delay_run_hw_queue instead of open-coding it

Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# cda5fd1a 29-Jun-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: update tagset nr_hw_queues after queues reinit

We might have more/less queues once we reconnect/reset. For
example due to cpu going online/offline or controller constraints.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 7314183d 29-Jun-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: don't override opts->nr_io_queues

Its what the user passed, so its probably a better
idea to keep it intact. Also, limit the number of
I/O queues to max online cpus and the lport maximum
hw queues.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 20d0dfe6 27-Jun-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: move ctrl cap to struct nvme_ctrl

All transports use either a private cache of controller cap or an on-stack
copy, move it to the generic struct nvme_ctrl. In the future it will also
be maintained by the core.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# d858e5f0 24-Apr-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: move queue_count to the nvme_ctrl

All all transports use the queue_count in exactly the same, so move it to
the generic struct nvme_ctrl. In the future it will also be maintained by
the core.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-By: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 69fa9646 21-Jun-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: fix error recovery on link down.

Currently, the fc transport invokes nvme_fc_error_recovery() on every
io in which the transport detects an error. Which means:
a) it's really noisy on large io loads that all get hit by a link down.
b) we repeatively call nvme_stop_queues() even though queues are
stopped upon the first error or as first steps of reset_work.

Correct by:
Errors are only meaningful if the controller is in the LIVE state.
Thus, enact the reset_work only if LIVE. If called repeatively, state
will have already transitioned.
There's no need to stop the queues here. Let the first steps of
reset_work do the queue stopping.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0b5a7669 16-Jun-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: Fix crash when nvme controller connection fails.

If a controller connection is attempted (say to a subsystem that
does not exist), the first attempt errors out. If another connect
is attempted, it crashes.

Issue is the prior controller has yet execute it's final put, thus
its still on lists. However, opts points on it have been cleared, thus
causing the crash if they are referenced.

Fix is to add the missing put after the nvme_uninit_ctrl() call on
the attachment failure.

Signed-off-by: Paul Ely <Paul.Ely@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 36715cf4 22-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: replace ioabort msleep loop with completion

Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Wait for io aborts to complete wait converted from msleep look to
using a struct completion.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# b4dfd6ee 21-Jun-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: fix double calls to nvme_cleanup_cmd()

Current fc transport code, on io termination, is calling
nvme_cleanup_cmd() followed by the transport dma unmap routine
which also calls nvme_cleanup_cmd(). Which means two kfrees occur
on the same address, raising havoc. This resulted in odd data errors,
effectively corruption..

Fix by removing the extraneous double calls. Call now occurs only in
teardown paths and as part of dma unmap routine.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 180de007 25-Jun-2017 Christoph Hellwig <hch@lst.de>

nvme: read the subsystem NQN from Identify Controller

NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures. Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it. For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7aa1f427 18-Jun-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: use a single NVME_AQ_DEPTH and relax it to 32

No need to differentiate fabrics from pci/loop, also lower
it to 32 as we don't really need 256 inflight admin commands.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d86c4d8e 15-Jun-2017 Christoph Hellwig <hch@lst.de>

nvme: move reset workqueue handling to common code

This moves the nvme_reset function from the PCIe driver to common code,
renaming it to nvme_reset_ctrl in the process. Additionally a new
helper nvme_reset_ctrl_sync is added for the case where we want to
wait for the reset. To facilitate that the reset_work work structure is
move to the common nvme_ctrl structure and the ->reset_ctrl method is
removed. For now the drivers initialize the reset_work with their own
callback, but longer term we should move to callouts for specific
parts of the reset process and move even more code to the core.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


# 76f983cb 13-Jun-2017 Christoph Hellwig <hch@lst.de>

nvme-fc: merge init_request methods

Now that we get the tagset passed we can have a single implementation for
the I/O and admin queues.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# fdf9dfa8 04-May-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: move nr_reconnects to nvme_ctrl

It is not a user option but rather a variable controller
attribute.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 9a6327d2 07-Jun-2017 Sagi Grimberg <sagi@grimberg.me>

nvme: Move transports to use nvme-core workqueue

Instead of each transport using it's own workqueue, export
a single nvme-core workqueue and use that instead.

In the future, this will help us moving towards some unification
if controller setup/teardown flows.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# fc17b653 03-Jun-2017 Christoph Hellwig <hch@lst.de>

blk-mq: switch ->queue_rq return value to blk_status_t

Use the same values for use for request completion errors as the return
value from ->queue_rq. BLK_STS_RESOURCE is special cased to cause
a requeue, and all the others are completed as-is.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 24b7f059 05-Jun-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: fix missing put reference on controller create failure

The failure case, of a create controller request, called
nvme_uninit_ctrl() but didn't do a put to allow the nvme
controller to be deleted.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# f874d5d0 01-Jun-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: on lldd/transport io error, terminate association

Per FC-NVME, when lldd or transport detects an i/o error, the
connection must be terminated, which in turn requires the association
to be termianted. Currently the transport simply creates a nvme
completion status of transport error and returns the io. The FC-NVME
spec makes the mandate as initiator and host, depending on the error,
can get out of sync on outstanding io counts (sqhd/sqtail).

Implement the association teardown on lldd or transport detected
errors.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


# 8e412263 17-May-2017 Christoph Hellwig <hch@lst.de>

nvme: switch to uuid_t

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# d3d5b87d 20-May-2017 Christoph Hellwig <hch@lst.de>

nvme: replace is_flags field in nvme_ctrl_ops with a flags field

So that we can have more flags for transport-specific behavior.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>


# 2cb657bc 15-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: remove extra controller reference taken on reconnect

fix extra controller reference taken on reconnect by moving
reference to initial controller create

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e392e1f1 15-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: correct nvme status set on abort

correct nvme status set on abort. Patch that changed status to being actual
nvme status crossed in the night with the patch that added abort values.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 589ff775 15-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: set logging level on resets/deletes

Per the review by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Looked at existing warn vs info vs err dev_xxx levels for the messages
printed on reconnects and deletes:
- Resets due to error and resets transitioned to deletes are dev_warn
- Other reset/disconnect messages are dev_info
- Removed chatty io queue related messages

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# a5321aa5 15-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: revise comment on teardown

Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

An extra reference was pointed out. There's no issue with the
references, but rather a literal interpretation of what the comment
is saying.

Reword the comment to avoid confusion.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5bbecdbc 15-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: Support ctrl_loss_tmo

Sync with Sagi's recent addition of ctrl_loss_tmo in the core fabrics
layer.

Remove local connect limits and connect_attempts variable.
Use fabrics new nr_connects variable and use of nvmf_should_reconnect()
Refactor duplicate reconnect failure code.

Addresses review comment by Sagi on controller reset support:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 0ce872bf 15-May-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: get rid of local reconnect_delay

Remove the local copy of reconnect_delay.
Use the value in the controller options directly.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 2952a879 25-Apr-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: stop queues on error detection

Per the recommendation by Sagi on:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009261.html

Rather than waiting for reset work thread to stop queues and abort the ios,
immediately stop the queues on error detection. Reset thread will restop
the queues (as it's called on other paths), but it does not appear to have
a side effect.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 85e6a6ad 05-May-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: require target or discovery role for fc-nvme targets

In order to create an association, the remoteport must be
serving either a target role or a discovery role.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# d6296d39 01-May-2017 Christoph Hellwig <hch@lst.de>

blk-mq: update ->init_request and ->exit_request prototypes

Remove the request_idx parameter, which can't be used safely now that we
support I/O schedulers with blk-mq. Except for a superflous check in
mtip32xx it was unused anyway.

Also pass the tag_set instead of just the driver data - this allows drivers
to avoid some code duplication in a follow on cleanup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# de41447a 24-Apr-2017 Ewan D. Milne <emilne@redhat.com>

nvme-fc: avoid memory corruption caused by calling nvmf_free_options() twice

Do not call nvmf_free_options() from the nvme_fc_ctlr destructor if
nvme_fc_create_ctrl() returns an error, because nvmf_create_ctrl()
frees the options when an error is returned.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# baee29ac 21-Apr-2017 Christoph Hellwig <hch@lst.de>

nvme-fc: mark two symbols static

Found by sparse.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>


# 61bff8ef 23-Apr-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: add controller reset support

This patch actually does quite a few things. When looking to add
controller reset support, the organization modeled after rdma was
very fragmented. rdma duplicates the reset and teardown paths and does
different things to the block layer on the two paths. The code to build
up the controller is also duplicated between the initial creation and
the reset/error recovery paths. So I decided to make this sane.

I reorganized the controller creation and teardown so that there is a
connect path and a disconnect path. Initial creation obviously uses
the connect path. Controller teardown will use the disconnect path,
followed last access code. Controller reset will use the disconnect
path to stop operation, and then the connect path to re-establish
the controller.

Along the way, several things were fixed
- aens were not properly set up. They are allocated differently from
the per-request structure on the blk queues.
- aens were oddly torn down. the prior patch corrected to abort, but
we still need to dma unmap and free relative elements.
- missed a few ref counting points: in aen completion and on i/o's
that fail
- controller initial create failure paths were still confused vs teardown
before converting to ref counting vs after we convert to refcounting.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 78a7ac26 23-Apr-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: add aen abort to teardown

Add abort support for aens. Commonized the op abort to apply to aen or
real ios (caused some reorg/routine movement). Abort path sets termination
flag in prep for next patch that will be watching i/o abort completion
before proceeding with controller teardown.

Now that we're aborting aens, the "exit" code that simply cleared out
their context no longer applies.

Also clarified how we detect an AEN vs a normal io - by a flag, not
by whether a rq exists or the a rqno is out of range.

Note: saw some interesting cases where if the queues are stopped and
we're waiting for the aborts, the core layer can call the complete_rq
callback for the io. So the io completion synchronizes link side completion
with possible blk layer completion under error.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 458f280d 23-Apr-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: fix command id check

The code validates the command_id in the response to the original
sqe command. But prior code was using the rq->rqno as the sqe command
id. The core layer overwrites what the transport set there originally.

Use the actual sqe content.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 8d64daf7 11-Apr-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: Add ls aborts on remote port teardown

remoteport teardown never aborted the LS opertions. Add support.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# c913a8b0 11-Apr-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: Move LS's to rport

Link LS's on the remoteport rather than the controller. LS's are
between nport's. Makes more sense, especially on async teardown where
the controller is torn down regardless of the LS (LS is more of a notifier
to the target of the teardown), to have them on the remoteport.

While revising ls send/done routines, issues were seen relative to
refcounting and cleanup, especially in async path. Reworked these code
paths.

Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# 27fa9bc5 20-Apr-2017 Christoph Hellwig <hch@lst.de>

nvme: split nvme status from block req->errors

We want our own clearly defined error field for NVMe passthrough commands,
and the request errors field is going away in its current form.

Just store the status and result field in the nvme_request field from
hardirq completion context (using a new helper) and then generate a
Linux errno for the block layer only when we actually need it.

Because we can't overload the status value with a negative error code
for cancelled command we now have a flags filed in struct nvme_request
that contains a bit for this condition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# d663b69f 20-Apr-2017 Christoph Hellwig <hch@lst.de>

nvme-fc: fix status code handling in nvme_fc_fcpio_done

nvme_complete_async_event expects the little endian status code
including the phase bit, and a new completion handler I plan to
introduce will do so as well.

Change the status variable into the little endian format with the
phase bit used in the NVMe CQE to fix / enable this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# c6c64a94 06-Apr-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: Fix sqsize wrong assignment based on ctrl MQES capability

both our sqsize and the controller MQES cap are a 0 based value,
so making it 1 based is wrong.

Reported-by: Trapp, Darren <Darren.Trapp@cavium.com>
Reported-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 77f02a7a 30-Mar-2017 Christoph Hellwig <hch@lst.de>

nvme: factor request completion code into a common helper

This avoids duplicating the logic four times, and it also allows to keep
some helpers static in core.c or just opencode them.

Note that this loses printing the aborted status on completions in the
PCI driver as that uses a data structure not available any more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 4bca70d0 30-Mar-2017 Christoph Hellwig <hch@lst.de>

nvme-fc: drop ctrl for all command completions

A requeue means we go through nvme_fc_start_fcp_op again and get
another controller reference. To make sure the refcount doesn't
leak we also need to drop it for every completion that came from
the LLDD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f2cd54d3 29-Mar-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: increment request retries counter before requeuing

This way our max retry limit holds as well.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 62eeacb0 23-Mar-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: Clean up host fcpio done status handling

As Dan Carpenter pointed out: mixing 16-bit nvme status with 32-bit
error status from driver. Corrected comment on fcp request struct
status field, and converted done routine to explicitly set nvme status
codes for nvme status.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f77fc87c 23-Mar-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: correct LS validation

LS validations shouldn't have been independent checks.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 726a1080 23-Mar-2017 James Smart <jsmart2021@gmail.com>

nvme_fc: Add check of status_code in ERSP_IU

Add check of status_code in ERSP_IU

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# c0e4a6f5 19-Mar-2017 Sagi Grimberg <sagi@grimberg.me>

nvme-fc: fix module_init (theoretical) error path

If nvmf_register_transport happened to fail
(it can't, but theoretically) we leak memory.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f363b089 30-Mar-2017 Eric Biggers <ebiggers@google.com>

blk-mq: constify struct blk_mq_ops

Constify all instances of blk_mq_ops, as they are never modified.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# faef3af6 17-Feb-2017 James Smart <jsmart2021@gmail.com>

nvme-fc: don't bother to validate ioccsz and iorcsz

Discovery controllers don't set the values. They are in reserved
areas of the Identify Controller data structure.

Given the cmd completed, the minimal capsule sizes are supported,
so no need to check nqn to detect discovery controllers and
special case validations.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e5a39dd8 27-Jan-2017 Johannes Thumshirn <jthumshirn@suse.de>

nvme: make nvmf_register_transport require a create_ctrl callback

nvmf_create_ctrl() relys on the presence of a create_crtl callback in the
registered nvmf_transport_ops, so make nvmf_register_transport require one.

Update the available call-sites as well to reflect these changes.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 57292b58 31-Jan-2017 Christoph Hellwig <hch@lst.de>

block: introduce blk_rq_is_passthrough

This can be used to check for fs vs non-fs requests and basically
removes all knowledge of BLOCK_PC specific from the block layer,
as well as preparing for removing the cmd_type field in struct request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 19e420bb 19-Jan-2017 Christoph Hellwig <hch@lst.de>

nvme-fc: use blk_rq_nr_phys_segments

Without this deallocate won't work properly due to the mismatch
of the bio/request size and the actual payload size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>


# b131c61d 12-Jan-2017 Christoph Hellwig <hch@lst.de>

nvme: use blk_rq_payload_bytes

The new blk_rq_payload_bytes generalizes the payload length hacks
that nvme_map_len did before.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 17a1ec08 15-Dec-2016 Johannes Thumshirn <jthumshirn@suse.de>

nvme/fc: simplify error handling of nvme_fc_create_hw_io_queues

Simplify the error handling of nvme_fc_create_hw_io_queues(), this saves us
one variable and one level of indentation.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviwed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# c7034898 20-Dec-2016 James Smart <james.smart@broadcom.com>

nvme/fc: correct some printk information

Dan Carpenters's tool caught a pointer reference - should have been
just ptr, not &ptr.

Don't bother. Remove the pointer value in the printf. Its irrelevant.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# e399441d 02-Dec-2016 James Smart <jsmart2021@gmail.com>

nvme-fabrics: Add host support for FC transport

Implements the FC-NVME T11 definition of how nvme fabric capsules are
performed on an FC fabric. Utilizes a lower-layer API to FC host adapters
to send/receive FC-4 LS operations and FCP operations that comprise NVME
over FC operation.

The T11 definitions for FC-4 Link Services are implemented which create
NVMeOF connections. Implements the hooks with blk-mq to then submit admin
and io requests to the different connections.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>