#
40547052 |
|
06-Feb-2024 |
Francis Pravin <francis.p@samsung.com> |
nvme: use ns->head->pi_size instead of t10_pi_tuple structure size Currently kernel supports 8 byte and 16 byte protection information. So, use ns->head->pi_size instead of sizeof(struct t10_pi_tuple). Signed-off-by: Francis Pravin <francis.p@samsung.com> Signed-off-by: Sathyavathi M <sathya.m@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
|
#
9419e71b |
|
18-Dec-2023 |
Daniel Wagner <dwagner@suse.de> |
nvme: move ns id info to struct nvme_ns_head Move the namesapce info to struct nvme_ns_head, because it's the same for all associated namespaces. Note: with multipathing enabled the PI information is shared between all paths. If a path is using a different PI configuration it will overwrite the previous settings. This is obviously not correct and such configuration will be rejected in future. For the time being we expect a correctly configured storage. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
|
#
b66509b8 |
|
30-Nov-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: split out cmd api into a separate header linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring cmd infra are completely orthogonal and don't need each other's definitions. Start cleaning it up by splitting out all command bits into a new header file. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d6aacee9 |
|
30-Nov-2023 |
Keith Busch <kbusch@kernel.org> |
nvme: use bio_integrity_map_user Map user metadata buffers directly. Now that the bio tracks the metadata, nvme doesn't need special metadata handling and tracking with callbacks and additional fields in the pdu. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20231130215309.2923568-3-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7be866b1 |
|
02-May-2023 |
Keith Busch <kbusch@kernel.org> |
nvme-ioctl: move capable() admin check to the end This can be an expensive call on some kernel configs. Move it to the end after checking the cheaper ways to determine if the command is allowed. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
|
#
1147dd05 |
|
18-Oct-2023 |
Anuj Gupta <anuj20.g@samsung.com> |
nvme: fix error-handling for io_uring nvme-passthrough Driver may return an error before submitting the command to the device. Ensure that such error is propagated up. Fixes: 456cba386e94 ("nvme: wire-up uring-cmd support for io-passthru on char-device.") Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
|
#
2b32c76e |
|
16-Oct-2023 |
Keith Busch <kbusch@kernel.org> |
nvme: sanitize metadata bounce buffer for reads User can request more metadata bytes than the device will write. Ensure kernel buffer is initialized so we're not leaking unsanitized memory on the copy-out. Fixes: 0b7f1f26f95a51a ("nvme: use the block layer for userspace passthrough metadata") Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
|
#
80814b8e |
|
02-Aug-2023 |
Jinyoung Choi <j-young.choi@samsung.com> |
bio-integrity: update the payload size in bio_integrity_add_page() Previously, the bip's bi_size has been set before an integrity pages were added. If a problem occurs in the process of adding pages for bip, the bi_size mismatch problem must be dealt with. When the page is successfully added to bvec, the bi_size is updated. The parts affected by the change were also contained in this commit. Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com> Tested-by: "Martin K. Petersen" <martin.petersen@oracle.com> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230803024956epcms2p38186a17392706650c582d38ef3dbcd32@epcms2p3 Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a7a7dabb |
|
08-Aug-2023 |
Ming Lei <ming.lei@redhat.com> |
nvme: core: don't hold rcu read lock in nvme_ns_chr_uring_cmd_iopoll Now nvme_ns_chr_uring_cmd_iopoll() has switched to request based io polling, and the associated NS is guaranteed to be live in case of io polling, so request is guaranteed to be valid because blk-mq uses pre-allocated request pool. Remove the rcu read lock in nvme_ns_chr_uring_cmd_iopoll(), which isn't needed any more after switching to request based io polling. Fix "BUG: sleeping function called from invalid context" because set_page_dirty_lock() from blk_rq_unmap_user() may sleep. Fixes: 585079b6e425 ("nvme: wire up async polling for io passthrough commands") Reported-by: Guangwu Zhang <guazhang@redhat.com> Cc: Kanchan Joshi <joshi.k@samsung.com> Cc: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guangwu Zhang <guazhang@redhat.com> Link: https://lore.kernel.org/r/20230809020440.174682-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9408d8a3 |
|
12-Jun-2023 |
Keith Busch <kbusch@kernel.org> |
nvme: improved uring polling Drivers can poll requests directly, so use that. We just need to ensure the driver's request was allocated from a polled hctx, so a special driver flag is added to struct io_uring_cmd. The allows unshared and multipath namespaces to use the same polling callback, and multipath is guaranteed to get the same queue as the command was submitted on. Previously multipath polling might check a different path and poll the wrong info. The other bonus is we don't need a bio payload in order to poll, allowing commands like 'flush' and 'write zeroes' to be submitted on the same high priority queue as read and write commands. Finally, using the request based polling skips the unnecessary bio overhead. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230612190343.2087040-3-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
05bdb996 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: replace fmode_t with a block-specific type for block open flags The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7d9d7d59 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
nvme: replace the fmode_t argument to the nvme ioctl handlers with a simple bool Instead of passing a fmode_t and only checking it fo0r FMODE_WRITE, pass a bool open_for_write to prepare for callers that won't have the fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-22-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
31a59782 |
|
26-May-2023 |
min15.li <min15.li@samsung.com> |
nvme: fix miss command type check In the function nvme_passthru_end(), only the value of the command opcode is checked, without checking the command type (IO command or Admin command). When we send a Dataset Management command (The opcode of the Dataset Management command is the same as the Set Feature command), kernel thinks it is a set feature command, then sets the controller's keep alive interval, and calls nvme_keep_alive_work(). Signed-off-by: min15.li <min15.li@samsung.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
|
#
f026be0e |
|
15-May-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
nvme: optimise io_uring passthrough completion Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough commands completion. It further delays the execution of task_work for DEFER_TASKRUN until there are enough of task_work items queued to meet the waiting criteria, which reduces the number of wake ups we issue. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ecdfacd0967a22d88b7779e2efd09e040825d0f8.1684154817.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
fd9b8547 |
|
04-May-2023 |
Breno Leitao <leitao@debian.org> |
io_uring: Pass whole sqe to commands Currently uring CMD operation relies on having large SQEs, but future operations might want to use normal SQE. The io_uring_cmd currently only saves the payload (cmd) part of the SQE, but, for commands that use normal SQE size, it might be necessary to access the initial SQE fields outside of the payload/cmd block. So, saves the whole SQE other than just the pdu. This changes slightly how the io_uring_cmd works, since the cmd structures and callbacks are not opaque to io_uring anymore. I.e, the callbacks can look at the SQE entries, not only, in the cmd structure. The main advantage is that we don't need to create custom structures for simple commands. Creates io_uring_sqe_cmd() that returns the cmd private data as a null pointer and avoids casting in the callee side. Also, make most of ublk_drv's sqe->cmd priv structure into const, and use io_uring_sqe_cmd() to get the private structure, removing the unwanted cast. (There is one case where the cast is still needed since the header->{len,addr} is updated in the private structure) Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20230504121856.904491-3-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9d2789ac |
|
20-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
block/io_uring: pass in issue_flags for uring_cmd task_work handling io_uring_cmd_done() currently assumes that the uring_lock is held when invoked, and while it generally is, this is not guaranteed. Pass in the issue_flags associated with it, so that we have IO_URING_F_UNLOCKED available to be able to lock the CQ ring appropriately when completing events. Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
62281b9e |
|
14-Dec-2022 |
Christoph Hellwig <hch@lst.de> |
nvme: remove nvme_execute_passthru_rq After moving the nvme_passthru_end call to the callers of nvme_execute_passthru_rq, this function has become quite pointless, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
#
888545cb |
|
17-Jan-2023 |
Anuj Gupta <anuj20.g@samsung.com> |
nvme: set REQ_ALLOC_CACHE for uring-passthru request This patch sets REQ_ALLOC_CACHE flag for uring-passthru requests. This is a prep-patch so that normal / IRQ-driven uring-passthru I/Os can also leverage bio-cache. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20230117120638.72254-2-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
313c08c7 |
|
07-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
nvme: don't allow unprivileged passthrough on partitions Passthrough commands can always access the entire device, and thus submitting them on partitions is an privelege escalation. In hindsight we should have never allowed any passthrough commands on partitions, but it's probably too late to change that decision now. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
#
7b7fdb8e |
|
07-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
nvme: replace the "bool vec" arguments with flags in the ioctl path To prepare for passing down more information, replace the boolean vec argument with a more extensible flags one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
#
2fa1dc86 |
|
07-Jan-2023 |
Christoph Hellwig <hch@lst.de> |
nvme: remove __nvme_ioctl Open code __nvme_ioctl in the two callers to make future changes that pass down additional paramters in the ioctl path easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
#
6f99ac04 |
|
13-Dec-2022 |
Christoph Hellwig <hch@lst.de> |
nvme: consult the CSE log page for unprivileged passthrough Commands like Write Zeros can change the contents of a namespaces without actually transferring data. To protect against this, check the Commands Supported and Effects log is supported by the controller for any unprivileg command passthrough and refuse unprivileged passthrough if the command has any effects that can change data or metadata. Note: While the Commands Support and Effects log page has only been mandatory since NVMe 2.0, it is widely supported because Windows requires it for any command passthrough from userspace. Fixes: e4fbcf32c860 ("nvme: identify-namespace without CAP_SYS_ADMIN") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
|
#
ea43fcee |
|
05-Dec-2022 |
Joel Granados <j.granados@samsung.com> |
nvme: allow unprivileged passthrough of Identify Controller Add unprivileged passthrough of the I/O Command Set Independent and I/O Command Set Specific Identify Controller sub-command. This will allow access to attributes (e.g. MDTS and WZSL) that are needed to effectively form passthrough I/O to the /dev/ng* character devices. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
e4fbcf32 |
|
01-Nov-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: identify-namespace without CAP_SYS_ADMIN Allow all identify-namespace variants (CNS 00h, 05h and 08h) without requiring CAP_SYS_ADMIN. The information (retrieved using id-ns) is needed to form IO commands for passthrough interface. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
855b7717 |
|
31-Oct-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: fine-granular CAP_SYS_ADMIN for nvme io commands Currently both io and admin commands are kept under a coarse-granular CAP_SYS_ADMIN check, disregarding file mode completely. $ ls -l /dev/ng* crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1 crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2 In the example above, ng0n1 appears as if it may allow unprivileged read/write operation but it does not and behaves same as ng0n2. This patch implements a shift from CAP_SYS_ADMIN to more fine-granular control for io-commands. If CAP_SYS_ADMIN is present, nothing else is checked as before. Otherwise, following rules are in place - any admin-cmd is not allowed - vendor-specific and fabric commmand are not allowed - io-commands that can write are allowed if matching FMODE_WRITE permission is present - io-commands that read are allowed Add a helper nvme_cmd_allowed that implements above policy. Change all the callers of CAP_SYS_ADMIN to go through nvme_cmd_allowed for any decision making. Since file open mode is counted for any approval/denial, change at various places to keep file-mode information handy. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
23fd22e5 |
|
30-Sep-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: wire up fixed buffer support for nvme passthrough if io_uring sends passthrough command with IORING_URING_CMD_FIXED flag, use the pre-registered buffer for IO (non-vectored variant). Pass the buffer/length to io_uring and get the bvec iterator for the range. Next, pass this bvec to block-layer and obtain a bio/request for subsequent processing. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220930062749.152261-13-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4d174486 |
|
30-Sep-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: pass ubuffer as an integer This is a prep patch. Modify nvme_submit_user_cmd and nvme_map_user_request to take ubuffer as plain integer argument, and do away with nvme_to_user_ptr conversion in callers. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220930062749.152261-12-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
470e900c |
|
30-Sep-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: refactor nvme_alloc_request nvme_alloc_request expects a large number of parameters. Split this out into two functions to reduce number of parameters. First one retains the name nvme_alloc_request, while second one is named nvme_map_user_request. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220930062749.152261-8-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
38c0ddab |
|
30-Sep-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: refactor nvme_add_user_metadata Pass struct request rather than bio. It helps to kill a parameter, and some processing clean-up too. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220930062749.152261-7-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7f056357 |
|
30-Sep-2022 |
Anuj Gupta <anuj20.g@samsung.com> |
nvme: Use blk_rq_map_user_io helper User blk_rq_map_user_io instead of duplicating the same code at different places Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/20220930062749.152261-6-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
851eb780 |
|
22-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
nvme: enable batched completions of passthrough IO Now that the normal passthrough end_io path doesn't need the request anymore, we can kill the explicit blk_mq_free_request() and just pass back RQ_END_IO_FREE instead. This enables the batched completion from freeing batches of requests at the time. This brings passthrough IO performance at least on par with bdev based O_DIRECT with io_uring. With this and batche allocations, peak performance goes from 110M IOPS to 122M IOPS. For IRQ based, passthrough is now also about 10% faster than previously, going from ~61M to ~67M IOPS. Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Co-developed-by: Stefan Roesch <shr@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c0a7ba77 |
|
21-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
nvme: split out metadata vs non metadata end_io uring_cmd completions By splitting up the metadata and non-metadata end_io handling, we can remove any request dependencies on the normal non-metadata IO path. This is in preparation for enabling the normal IO passthrough path to pass the ownership of the request back to the block layer. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Co-developed-by: Stefan Roesch <shr@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
de671d61 |
|
21-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
block: change request end_io handler to pass back a return value Everything is just converted to returning RQ_END_IO_NONE, and there should be no functional changes with this patch. In preparation for allowing the end_io handler to pass ownership back to the block layer, rather than retain ownership of the request. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
23e085b2 |
|
22-Sep-2022 |
Keith Busch <kbusch@kernel.org> |
nvme: restrict management ioctls to admin The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
bc8fb906 |
|
19-Sep-2022 |
Keith Busch <kbusch@kernel.org> |
nvme: handle effects after freeing the request If a reset occurs after the scan work attempts to issue a command, the reset may quisce the admin queue, which blocks the scan work's command from dispatching. The scan work will not be able to complete while the queue is quiesced. Meanwhile, the reset work will cancel all outstanding admin tags and wait until all requests have transitioned to idle, which includes the passthrough request. But the passthrough request won't be set to idle until after the scan_work flushes, so we're deadlocked. Fix this by handling the end effects after the request has been freed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216354 Reported-by: Jonathan Derrick <Jonathan.Derrick@solidigm.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chao Leng <lengchao@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
de97fcb3 |
|
02-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
fs: add batch and poll flags to the uring_cmd_iopoll() handler We need the poll_flags to know how to poll for the IO, and we should have the batch structure in preparation for supporting batched completions with iopoll. Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
585079b6 |
|
23-Aug-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: wire up async polling for io passthrough commands Store a cookie during submission, and use that to implement completion-polling inside the ->uring_cmd_iopoll handler. This handler makes use of existing bio poll facility. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-5-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f9ed86dc |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
nvme/host: Use the enum req_op and blk_opf_t types Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-38-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e2e53086 |
|
24-May-2022 |
Christoph Hellwig <hch@lst.de> |
blk-mq: remove the done argument to blk_execute_rq_nowait Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
58e5bdeb |
|
20-May-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: enable uring-passthrough for admin commands Add two new opcodes that userspace can use for admin commands: NVME_URING_CMD_ADMIN : non-vectroed NVME_URING_CMD_ADMIN_VEC : vectored variant Wire up support when these are issued on controller node(/dev/nvmeX). Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
00fc2eeb |
|
20-May-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: helper for uring-passthrough checks Factor out a helper consolidating the error checks, and fix typo in a comment too. This is in preparation to support admin commands on this path. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220520090630.70394-2-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f569add4 |
|
10-May-2022 |
Anuj Gupta <anuj20.g@samsung.com> |
nvme: add vectored-io support for uring-cmd wire up support for async passthru that takes an array of buffers (using iovec). Exposed via a new op NVME_URING_CMD_IO_VEC. Same 'struct nvme_uring_cmd' is to be used with - 1. cmd.addr as base address of user iovec array 2. cmd.data_len as count of iovec array elements Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511054750.20432-6-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
456cba38 |
|
10-May-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: wire-up uring-cmd support for io-passthru on char-device. Introduce handler for fops->uring_cmd(), implementing async passthru on char device (/dev/ngX). The handler supports newly introduced operation NVME_URING_CMD_IO. This operates on a new structure nvme_uring_cmd, which is similar to struct nvme_passthru_cmd64 but without the embedded 8b result field. This field is not needed since uring-cmd allows to return additional result via big-CQE. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220511054750.20432-5-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
bcad2565 |
|
10-May-2022 |
Christoph Hellwig <hch@lst.de> |
nvme: refactor nvme_submit_user_cmd() Divide the work into two helpers, namely nvme_alloc_user_request and nvme_execute_user_rq. This is a prep patch, to help wiring up uring-cmd support in nvme. Signed-off-by: Christoph Hellwig <hch@lst.de> [axboe: fold in fix for assuming bio is non-NULL] Link: https://lore.kernel.org/r/20220511054750.20432-4-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e559398f |
|
15-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
nvme: remove nvme_alloc_request and nvme_alloc_request_qid Just open code the allocation + initialization in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
|
#
89377bc1 |
|
09-Feb-2022 |
Kanchan Joshi <joshi.k@samsung.com> |
nvme: add vectored-io support for user-passthrough Add a new NVME_IOCTL_IO64_CMD_VEC ioctl that works like the existing NVME_IOCTL_IO64_CMD ioctl except that it takes and array of iovecs and thus supports vectored I/O. - cmd.addr is base address of user iovec array - cmd.vec_cnt is count of iovec array elements This patch does not include vectored-variant for admin-commands as most of them are light on buffers and likely to have low invocation frequency. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
9ea9b9c4 |
|
12-Aug-2021 |
Christoph Hellwig <hch@lst.de> |
remove the lightnvm subsystem Lightnvm supports the OCSSD 1.x and 2.0 specs which were early attempts to produce Open Channel SSDs and never made it into the NVMe spec proper. They have since been superceeded by NVMe enhancements such as ZNS support. Remove the support per the deprecation schedule. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210812132308.38486-1-hch@lst.de Reviewed-by: Matias Bjørling <mb@lightnvm.io> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ae5e6886 |
|
10-Jun-2021 |
Keith Busch <kbusch@kernel.org> |
nvme: use return value from blk_execute_rq() We don't have an nvme status to report if the driver's .queue_rq() returns an error without dispatching the requested nvme command. Check the return value from blk_execute_rq() for all passthrough commands so the caller may know their command was not successful. If the command is from the target passthrough interface and fails to dispatch, synthesize the response back to the host as a internal target error. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210610214437.641245-5-kbusch@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e7d4b549 |
|
07-Jun-2021 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
nvme: factor out a nvme_validate_passthru_nsid helper Add a helper nvme_validate_passthru_nsid() to validate the nsid that removes the nsid validation and error message print code from nvme_user_cmd() and nvme_user_cmd64(). Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
85b790a7 |
|
19-May-2021 |
Christoph Hellwig <hch@lst.de> |
nvme: add a sparse annotation to nvme_ns_head_ctrl_ioctl Add the __releases annotation to tell sparse that nvme_ns_head_ctrl_ioctl is expected to unlock head->srcu. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
#
3e7d1a55 |
|
19-May-2021 |
Christoph Hellwig <hch@lst.de> |
nvme: open code nvme_put_ns_from_disk in nvme_ns_head_ctrl_ioctl nvme_ns_head_ctrl_ioctl is always used on multipath nodes, so just call srcu_read_unlock directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
#
86b4284d |
|
19-May-2021 |
Christoph Hellwig <hch@lst.de> |
nvme: open code nvme_{get,put}_ns_from_disk in nvme_ns_head_ioctl nvme_ns_head_ioctl is always used on multipath nodes, no need to deal with the de-multiplexers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
#
f423c85c |
|
19-May-2021 |
Christoph Hellwig <hch@lst.de> |
nvme: open code nvme_put_ns_from_disk in nvme_ns_head_chr_ioctl nvme_ns_head_chr_ioctl is always used on multipath nodes, so just call srcu_read_unlock and consolidate the two unlock paths. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
|
#
48145b62 |
|
22-Apr-2021 |
Minwoo Im <minwoo.im.dev@gmail.com> |
nvme: fix controller ioctl through ns_head In multipath case, we should consider namespace attachment with controllers in a subsystem when we find out the live controller for the namespace. This patch manually reverted the commit 3557a4409701 ("nvme: don't bother to look up a namespace for controller ioctls") with few more updates to nvme_ns_head_chr_ioctl which has been newly updated. Fixes: 3557a4409701 ("nvme: don't bother to look up a namespace for controller ioctls") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
2637baed |
|
21-Apr-2021 |
Minwoo Im <minwoo.im.dev@gmail.com> |
nvme: introduce generic per-namespace chardev Userspace has not been allowed to I/O to device that's failed to be initialized. This patch introduces generic per-namespace character device to allow userspace to I/O regardless the block device is there or not. The chardev naming convention will similar to the existing blkdev naming, using a ng prefix instead of nvme, i.e. - /dev/ngXnY It also supports multipath which means it will not expose chardev for the hidden namespace blkdevs (e.g., nvmeXcYnZ). If /dev/ngXnY is created for a ns_head, then I/O request will be routed to a specific controller selected by the iopolicy of the subsystem. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Tested-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
2405252a |
|
10-Apr-2021 |
Christoph Hellwig <hch@lst.de> |
nvme: move the ioctl code to a separate file Split out the ioctl code from core.c into a new file. Also update copyrights while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
|