#
fbbd5d3a |
|
29-Mar-2024 |
Damien Le Moal <dlemoal@kernel.org> |
nullblk: Fix cleanup order in null_add_dev() error path In null_add_dev(), if an error happen after initializing the resources for a zoned null block device, we must free these resources before exiting the function. To ensure this, move the out_cleanup_zone label after out_cleanup_disk as we jump to this latter label if an error happens after calling null_init_zoned_dev(). Fixes: e440626b1caf ("null_blk: pass queue_limits to blk_mq_alloc_disk") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240330005300.1503252-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
0f225f87 |
|
22-Feb-2024 |
John Garry <john.g.garry@oracle.com> |
null_blk: Delete nullb.{queue_depth, nr_queues} Since commit 8b631f9cf0b8 ("null_blk: remove the bio based I/O path"), struct nullb members queue_depth and nr_queues are only ever written, so delete them. With that, null_exit_hctx() can also be deleted. Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20240222083420.6026-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e440626b |
|
20-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
null_blk: pass queue_limits to blk_mq_alloc_disk Pass the queue limits directly to blk_mq_alloc_disk instead of setting them one at a time. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
0a39e550 |
|
20-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
null_blk: remove null_gendisk_register null_gendisk_register isn't a very useful abstraction given that it doesn't even allocate the gendisk. Merge it into the only caller instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
72ca2876 |
|
20-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
null_blk: refactor tag_set setup Move the tagset initialization out of null_add_dev into a new null_setup_tagset helper, and move the shared vs local differences out of null_init_tag_set into the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e32b0855 |
|
20-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
null_blk: initialize the tag_set timeout in null_init_tag_set Otherwise it will be reset to the always same value when initializing a device using the shared tag_set. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8b631f9c |
|
20-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
null_blk: remove the bio based I/O path The bio based I/O path complicates null_blk and also make various data structures, including the per-command one way bigger than required for the main request based interface. As the bio-based path is mostly used by stacking drivers and simple memory based drivers, and brd is a good example driver for the latter there is no need to have a bio based path in null_blk. Remove the path to simplify the driver and make future block layer API changes simpler by not having to deal with the complex two API setup in null_blk. Note that the queue_mode field in struct nullb_device is kept as that is simpler than having two different places to check the value and fully open coding the debugfs helpers as the existing ones won't work without a named struct member. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240220093248.3290292-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
74fa8f9c |
|
15-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
block: pass a queue_limits argument to blk_alloc_disk Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Also change blk_alloc_disk to return an ERR_PTR instead of just NULL which can't distinguish errors. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
27e32cd2 |
|
13-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
block: pass a queue_limits argument to blk_mq_alloc_disk Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
14509b74 |
|
29-Jan-2024 |
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> |
null_blk: add configfs variable shared_tags Allow setting shared_tags through configfs, which could only be set as a module parameter. For that purpose, delay tag_set initialization from null_init() to null_add_dev(). Refer tag_set.ops as the flag to check if tag_set is initialized or not. The following parameters can not be set through configfs yet: timeout requeue init_hctx Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240130042134.2463659-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
95931a24 |
|
14-Jan-2024 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
null_blk: Remove usage of the deprecated ida_simple_xx() API ida_alloc() and ida_free() should be preferred to the deprecated ida_simple_get() and ida_simple_remove(). This is less verbose. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/bf257b1078475a415cdc3344c6a750842946e367.1705222845.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
72432547 |
|
28-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
null_blk: use the default discard granularity The discard granularity now defaults to a single sector, so don't set that value explicitly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231228075545.362768-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9a9525de |
|
27-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS null_blk has some rather odd capping of the max_hw_sectors value to BLK_DEF_MAX_SECTORS, which doesn't make sense - max_hw_sector is the hardware limit, and BLK_DEF_MAX_SECTORS despite the confusing name is the default cap for the max_sectors field used for normal file system I/O. Remove all the capping, and simply leave it to the block layer or user to take up or not all of that for file system I/O. Fixes: ea17fd354ca8 ("null_blk: Allow controlling max_hw_sectors limit") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231227092305.279567-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
53f2bca2 |
|
19-Nov-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
block/null_blk: Fix double blk_mq_start_request() warning When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by: <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/ Fixes: d78bfa1346ab ("block/null_blk: add queue_rqs() support") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e1f2760b |
|
18-Sep-2023 |
Justin Stitt <justinstitt@google.com> |
null_blk: replace strncpy with strscpy `strncpy` is deprecated for use on NUL-terminated destination strings [1]. We should favor a more robust and less ambiguous interface. We expect that both `nullb->disk_name` and `disk->disk_name` be NUL-terminated: | snprintf(nullb->disk_name, sizeof(nullb->disk_name), | "%s", config_item_name(&dev->group.cg_item)); ... | pr_info("disk %s created\n", nullb->disk_name); It seems like NUL-padding may be required due to __assign_disk_name() utilizing a memcpy as opposed to a `str*cpy` api. | static inline void __assign_disk_name(char *name, struct gendisk *disk) | { | if (disk) | memcpy(name, disk->disk_name, DISK_NAME_LEN); | else | memset(name, 0, DISK_NAME_LEN); | } Then we go and print it with `__print_disk_name` which wraps `nullb_trace_disk_name()`. | #define __print_disk_name(name) nullb_trace_disk_name(p, name) This function obviously expects a NUL-terminated string. | const char *nullb_trace_disk_name(struct trace_seq *p, char *name) | { | const char *ret = trace_seq_buffer_ptr(p); | | if (name && *name) | trace_seq_printf(p, "disk=%s, ", name); | trace_seq_putc(p, 0); | | return ret; | } >From the above, we need both 1) a NUL-terminated string and 2) a NUL-padded string. So, let's use strscpy_pad() as per Kees' suggestion from v1. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-null_blk-main-c-v3-1-10cf0a87a2c3@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d78bfa13 |
|
13-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
block/null_blk: add queue_rqs() support Add batched mq_ops.queue_rqs() support in null_blk for testing. The implementation is much easy since null_blk doesn't have commit_rqs(). We simply handle each request one by one, if errors are encountered, leave them in the passed in list and return back. There is about 3.6% improvement in IOPS of fio/t/io_uring on null_blk with hw_queue_depth=256 on my test VM, from 1.09M to 1.13M. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-6-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5a26e45e |
|
01-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
null_blk: fix poll request timeout handling When doing io_uring benchmark on /dev/nullb0, it's easy to crash the kernel if poll requests timeout triggered, as reported by David. [1] BUG: kernel NULL pointer dereference, address: 0000000000000008 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:null_timeout_rq+0x4e/0x91 Call Trace: ? null_timeout_rq+0x4e/0x91 blk_mq_handle_expired+0x31/0x4b bt_iter+0x68/0x84 ? bt_tags_iter+0x81/0x81 __sbitmap_for_each_set.constprop.0+0xb0/0xf2 ? __blk_mq_complete_request_remote+0xf/0xf bt_for_each+0x46/0x64 ? __blk_mq_complete_request_remote+0xf/0xf ? percpu_ref_get_many+0xc/0x2a blk_mq_queue_tag_busy_iter+0x14d/0x18e blk_mq_timeout_work+0x95/0x127 process_one_work+0x185/0x263 worker_thread+0x1b5/0x227 This is indeed a race problem between null_timeout_rq() and null_poll(). null_poll() null_timeout_rq() spin_lock(&nq->poll_lock) list_splice_init(&nq->poll_list, &list) spin_unlock(&nq->poll_lock) while (!list_empty(&list)) req = list_first_entry() list_del_init() ... blk_mq_add_to_batch() // req->rq_next = NULL spin_lock(&nq->poll_lock) // rq->queuelist->next == NULL list_del_init(&rq->queuelist) spin_unlock(&nq->poll_lock) Fix these problems by setting requests state to MQ_RQ_COMPLETE under nq->poll_lock protection, in which null_timeout_rq() can safely detect this race and early return. Note this patch just fix the kernel panic when request timeout happen. [1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/ Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20230901120306.170520-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8cfb9819 |
|
05-Jun-2023 |
Nitesh Shetty <nj.shetty@samsung.com> |
null_blk: Fix: memory release when memory_backed=1 Memory/pages are not freed, when unloading nullblk driver. Steps to reproduce issue 1.free -h total used free shared buff/cache available Mem: 7.8Gi 260Mi 7.1Gi 3.0Mi 395Mi 7.3Gi Swap: 0B 0B 0B 2.modprobe null_blk memory_backed=1 3.dd if=/dev/urandom of=/dev/nullb0 oflag=direct bs=1M count=1000 4.modprobe -r null_blk 5.free -h total used free shared buff/cache available Mem: 7.8Gi 1.2Gi 6.1Gi 3.0Mi 398Mi 6.3Gi Swap: 0B 0B 0B Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20230605062354.24785-1-nj.shetty@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3f89ac58 |
|
24-Apr-2023 |
Chaitanya Kulkarni <kch@nvidia.com> |
block/drivers: remove dead clear of random flag QUEUE_FLAG_ADD_RANDOM is not set before we clear it for "null_blk", "brd", "nbd", "zram", and "bcache" since by default we don't set "QUEUE_FLAG_ADD_RANDOM" to MQ ops. Remove dead clear of QUEUE_FLAG_ADD_RANDOM in above listed drivers. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> #zram Link: https://lore.kernel.org/r/20230424234628.45544-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
63f8793e |
|
16-Apr-2023 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: Always check queue mode setting from configfs Make sure to check device queue mode in the null_validate_conf() and return error for NULL_Q_RQ as we don't allow legacy I/O path, without this patch we get OOPs when queue mode is set to 1 from configfs, following are repro steps :- modprobe null_blk nr_devices=0 mkdir config/nullb/nullb0 echo 1 > config/nullb/nullb0/memory_backed echo 4096 > config/nullb/nullb0/blocksize echo 20480 > config/nullb/nullb0/size echo 1 > config/nullb/nullb0/queue_mode echo 1 > config/nullb/nullb0/power Entering kdb (current=0xffff88810acdd080, pid 2372) on processor 42 Oops: (null) due to oops @ 0xffffffffc041c329 CPU: 42 PID: 2372 Comm: sh Tainted: G O N 6.3.0-rc5lblk+ #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 RIP: 0010:null_add_dev.part.0+0xd9/0x720 [null_blk] Code: 01 00 00 85 d2 0f 85 a1 03 00 00 48 83 bb 08 01 00 00 00 0f 85 f7 03 00 00 80 bb 62 01 00 00 00 48 8b 75 20 0f 85 6d 02 00 00 <48> 89 6e 60 48 8b 75 20 bf 06 00 00 00 e8 f5 37 2c c1 48 8b 75 20 RSP: 0018:ffffc900052cbde0 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88811084d800 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100042e00 RBP: ffff8881053d8200 R08: ffffc900052cbd68 R09: ffff888105db2000 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002 R13: ffff888104765200 R14: ffff88810eec1748 R15: ffff88810eec1740 FS: 00007fd445fd1740(0000) GS:ffff8897dfc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 0000000166a00000 CR4: 0000000000350ee0 DR0: ffffffff8437a488 DR1: ffffffff8437a489 DR2: ffffffff8437a48a DR3: ffffffff8437a48b DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: <TASK> nullb_device_power_store+0xd1/0x120 [null_blk] configfs_write_iter+0xb4/0x120 vfs_write+0x2ba/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fd4460c57a7 Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffd3792a4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4460c57a7 RDX: 0000000000000002 RSI: 000055b43c02e4c0 RDI: 0000000000000001 RBP: 000055b43c02e4c0 R08: 000000000000000a R09: 00007fd44615b4e0 R10: 00007fd44615b3e0 R11: 0000000000000246 R12: 0000000000000002 R13: 00007fd446198520 R14: 0000000000000002 R15: 00007fd446198700 </TASK> Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Link: https://lore.kernel.org/r/20230416220339.43845-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
bb4c19e0 |
|
27-Mar-2023 |
Akinobu Mita <akinobu.mita@gmail.com> |
block: null_blk: make fault-injection dynamically configurable per device The null_blk driver has multiple driver-specific fault injection mechanisms. Each fault injection configuration can only be specified by a module parameter and cannot be reconfigured without reloading the driver. Also, each configuration is common to all devices and is initialized every time a new device is added. This change adds the following subdirectories for each null_blk device. /sys/kernel/config/nullb/<disk>/timeout_inject /sys/kernel/config/nullb/<disk>/requeue_inject /sys/kernel/config/nullb/<disk>/init_hctx_fault_inject Each fault injection attribute can be dynamically set per device by a corresponding file in these directories. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Link: https://lore.kernel.org/r/20230327143733.14599-3-akinobu.mita@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
acc3c879 |
|
30-Mar-2023 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: use kmap_local_page() and kunmap_local() Replace the deprecated API kmap_atomic() and kunmap_atomic() with kmap_local_page() and kunmap_local() in null_flush_cache_page(). Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
fbb5615f |
|
30-Mar-2023 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: use non-deprecated lib functions Use library helper memcpy_page() to copy source page into destination instead of having duplicate code in copy_to_nullb() & copy_from_nullb(). In copy_from_nullb() also replace the memset call with zero_user(). Use library helper memset_page() to set the buffer to 0xFF instead of having duplicate code. This also removes deprecated API kmap_atomic() from copy_to_nullb() copy_from_nullb() and nullb_fill_pattern(), from :include/linux/highmem.h: "kmap_atomic - Atomically map a page for temporary usage - Deprecated!" Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230330184926.64209-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b6402014 |
|
13-Mar-2023 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: cleanup null_queue_rq() Use a local struct request pointer variable to avoid having to dereference struct blk_mq_queue_data multiple times. While at it, also fix the function argument indentation and remove a useless "else" after a return. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
63f88659 |
|
13-Mar-2023 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: Fix handling of fake timeout request When injecting a fake timeout into the null_blk driver using fail_io_timeout, the request timeout handler does not execute blk_mq_complete_request(), so the complete callback is never executed for a timedout request. The null_blk driver also has a driver-specific fake timeout mechanism which does not have this problem. Fix the problem with fail_io_timeout by using the same meachanism as null_blk internal timeout feature, using the fake_timeout field of null_blk commands. Reported-by: Akinobu Mita <akinobu.mita@gmail.com> Fixes: de3510e52b0a ("null_blk: fix command timeout completion handling") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20230314041106.19173-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
0a26f327 |
|
05-Jan-2023 |
Keith Busch <kbusch@kernel.org> |
block: make BLK_DEF_MAX_SECTORS unsigned This is used as an unsigned value, so define it that way to avoid having to cast it. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230105205146.3610282-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d3a57388 |
|
30-Nov-2022 |
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> |
null_blk: support read-only and offline zone conditions In zoned mode, zones with write pointers can have conditions "read-only" or "offline". In read-only condition, zones can not be written. In offline condition, the zones can be neither written nor read. These conditions are intended for zones with media failures, then it is difficult to set those conditions to zones on real devices. To test handling of zones in the conditions, add a feature to null_blk to set up zones in read-only or offline condition. Add new configuration attributes "zone_readonly" and "zone_offline". Write a sector to the attribute files to specify the target zone to set the zone conditions. For example, following command lines do it: echo 0 > nullb1/zone_readonly echo 524288 > nullb1/zone_offline When the specified zones are already in read-only or offline condition, normal empty condition is restored to the zones. These condition changes can be done only after the null_blk device get powered, since status area of each zone is not yet allocated before power-on. Also improve zone condition checks to inhibit all commands for zones in offline conditions. In same manner, inhibit write and zone management commands for zones in read-only condition. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20221201061036.2342206-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a4e1d0b7 |
|
15-Aug-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: Change the return type of blk_mq_map_queues() into void Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig <hch@lst.de> Cc: Jason Wang <jasowang@redhat.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Doug Gilbert <dgilbert@interlog.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: John Garry <john.garry@huawei.com> Acked-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
10b41ea1 |
|
15-Aug-2022 |
Bart Van Assche <bvanassche@acm.org> |
null_blk: Modify the behavior of null_map_queues() Instead of returning -EINVAL if an internal inconsistency is detected, fall back to a single submission queue. This patch prepares for changing the return value of the .map_queues() callbacks into void. Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220815170043.19489-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ee452a8d |
|
15-Jul-2022 |
Dan Carpenter <dan.carpenter@oracle.com> |
null_blk: fix ida error handling in null_add_dev() There needs to be some error checking if ida_simple_get() fails. Also call ida_free() if there are errors later. Fixes: 94bc02e30fb8 ("nullb: use ida to manage index") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YtEhXsr6vJeoiYhd@kili Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7012eef5 |
|
08-Jul-2022 |
Vincent Fu <vincent.fu@samsung.com> |
null_blk: add configfs variables for 2 options Allow setting via configfs these two options: no_sched shared_tag_bitmap Previously these could only be activated as module parameters. Still missing are: shared_tags timeout requeue init_hctx Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220708174943.87787-3-vincent.fu@samsung.com [axboe: fold in nullb == NULL fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
058efe00 |
|
08-Jul-2022 |
Vincent Fu <vincent.fu@samsung.com> |
null_blk: add module parameters for 4 options Add as module parameters these options: memory_backed discard mbps cache_size Previously these could only be set via configfs. Still missing is bad_blocks. The kernel test robot found a documentation formatting issue in v1 of this patch. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Fu <vincent.fu@samsung.com> Link: https://lore.kernel.org/r/20220708174943.87787-2-vincent.fu@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
eb25ad80 |
|
03-Jul-2022 |
Christophe JAILLET <christophe.jaillet@wanadoo.fr> |
block: null_blk: Use the bitmap API to allocate bitmaps Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/7c4d3116ba843fc4a8ae557dd6176352a6cd0985.1656864320.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ff07a02e |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
treewide: Rename enum req_opf into enum req_op The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9bdb4833 |
|
06-Jul-2022 |
John Garry <john.garry@huawei.com> |
blk-mq: Drop blk_mq_ops.timeout 'reserved' arg With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8b9ab626 |
|
19-Jun-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove blk_cleanup_disk blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
aacae8c4 |
|
02-Jun-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: Fix null_zone_write() The bio and rq fields of struct nullb_cmd are now overlapping in a union. So we cannot use a test on ->bio being non-NULL to detect the NULL_Q_BIO queue mode. null_zone_write() use such broken test to set the sector position of a zone append write in the command bio or request. When the null_blk device uses the NULL_Q_MQ queue mode, null_zone_write() wrongly end up setting the bio sector position, resulting in the command request to be broken and random crashes following. Fix this by testing the device queue mode directly. Fixes: 8ba816b23abd ("null-blk: save memory footprint for struct nullb_cmd") Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220602120344.1365329-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
49c3b926 |
|
19-Apr-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: Improve device creation with configfs Currently, the directory name used to create a nullb device through sysfs is not used as the device name, potentially causing headaches for users if devices are already created through the modprobe operation withe the nr_device module parameter not set to 0. E.g. a user can do "mkdir /sys/kernel/config/nullb/nullb0" to create a nullb device even though /dev/nullb0 was already created by modprobe. In this case, the configfs nullb device will be named nullb1, causing confusion for the user. Simplify this by using the configfs directory name as the nullb device name, always, unless another nullb device is already using the same name. E.g. if modprobe created nullb0, then: $ mkdir /sys/kernel/config/nullb/nullb0 mkdir: cannot create directory '/sys/kernel/config/nullb/nullb0': File exists will be reported to the user. To implement this, the function null_find_dev_by_name() is added to check for the existence of a nullb device with the name used for a new configfs device directory. nullb_group_make_item() uses this new function to check if the directory name can be used as the disk name. Finally, null_add_dev() is modified to use the device config item name as the disk name for a new nullb device created using configfs. The naming of devices created though modprobe remains unchanged. Of note is that it is possible for a user to create through configfs a nullb device with the same name as an existing device. E.g. $ mkdir /sys/kernel/config/nullb/null will successfully create the nullb device named "null" but this block device will however not appear under /dev/ since /dev/null already exists. Suggested-by: Joseph Bacik <josef@toxicpanda.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-5-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
db060f54 |
|
19-Apr-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: Cleanup messages Use the pr_fmt() macro to prefix all null_blk pr_xxx() messages with "null_blk:" to clarify which module is printing the messages. Also add a pr_info() message in null_add_dev() to print the name of a newly created disk. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-4-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b3a0a73e |
|
19-Apr-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: Cleanup device creation and deletion Introduce the null_create_dev() and null_destroy_dev() helper functions to respectivel create nullb devices on modprobe and destroy them on rmmod. The null_destroy_dev() helper avoids duplicated code in the null_init() and null_exit() functions for deleting devices. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-3-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
525323d2 |
|
19-Apr-2022 |
Damien Le Moal <damien.lemoal@opensource.wdc.com> |
block: null_blk: Fix code style issues Fix message grammar and code style issues (brackets and indentation) in null_init(). Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220420005718.3780004-2-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
fb749a87 |
|
17-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
null_blk: don't set the discard_alignment queue limit The discard_alignment queue limit is named a bit misleading means the offset into the block device at which the discard granularity starts. Setting it to the discard granularity as done by null_blk is mostly harmless but also useless. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
70200574 |
|
14-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove QUEUE_FLAG_DISCARD Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3e3876d3 |
|
13-Apr-2022 |
Ming Lei <ming.lei@redhat.com> |
block: null_blk: end timed out poll request When poll request is timed out, it is removed from the poll list, but not completed, so the request is leaked, and never get chance to complete. Fix the issue by ending it in timeout handler. Fixes: 0a593fbbc245 ("null_blk: poll queue support") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220413084836.1571995-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
df00b1d2 |
|
22-Feb-2022 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: null_alloc_page() cleanup Remove goto labels and use direct returns as error unwinding code only needs to free t_page variable if we alloc_pages() call fails as having two labels for one kfree() can be avoided easily. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220222152852.26043-3-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c90b6b50 |
|
22-Feb-2022 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: remove hardcoded null_alloc_page() param Only caller of null_alloc_page() is null_insert_page() unconditionally sets only parameter to GFP_NOIO and that is statically hard-coded in null_blk. There is no point in having statically hardcoded function parameter. Remove the unnecessary parameter gfp_flags and adjust the code, so it can retain existing behavior null_alloc_page() with GFP_NOIO. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220222152852.26043-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3d3472f3 |
|
16-Feb-2022 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: remove hardcoded alloc_cmd() parameter Only caller of alloc_cmd() is null_submit_bio() unconditionally sets second parameter to true and that is statically hard-coded in null_blk. There is no point in having statically hardcoded function parameter. Remove the unnecessary parameter can_wait and adjust the code so it can retain existing behavior of waiting when we don't get valid nullb_cmd from __alloc_cmd() in alloc_cmd(). The restructured code avoids multiple return statements, multiple calls to __alloc_cmd() and resulting a fast path call to prepare_to_wait() due to removal of first alloc_cmd() call. Follow the pattern that we have in bio_alloc() to set the structure members in the structure allocation function in alloc_cmd() and pass bio to initialize newly allocated cmd->bio member. Follow the pattern in copy_to_nullb() to use result of one function call (null_cache_active()) to be used as a parameter to another function call (null_insert_page()), use result of alloc_cmd() as a first parameter to the null_handle_cmd() in null_submit_bio() function. This allow us to remove the local variable cmd on stack in null_submit_bio() that is in fast path. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220216172945.31124-2-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a75110c3 |
|
15-Feb-2022 |
Chaitanya Kulkarni <kch@nvidia.com> |
null_blk: fix return value from null_add_dev() The function nullb_device_power_store() returns -ENOMEM when null_add_dev() fails. null_add_dev() can fail with return value other than -ENOMEM such as -EINVAL when Zoned Block Device option is used, see : nullb_device_power_store() null_add_dev() null_init_zoned_dev() return -EINVAL; When trying to load the module having -ENOMEM value returned on the command line creates confusion when pleanty of memory is free on the machine. Instead of hardcoding -ENOMEM return the value of null_add_dev() function. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220215115951.15945-1-kch@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
19768f80 |
|
23-Dec-2021 |
Ming Lei <ming.lei@redhat.com> |
block: null_blk: only set set->nr_maps as 3 if active poll_queues is > 0 It isn't correct to set set->nr_maps as 3 if g_poll_queues is > 0 since we can change it via configfs for null_blk device created there, so only set it as 3 if active poll_queues is > 0. Fixes divide zero exception reported by Shinichiro. Fixes: 2bfdbe8b7ebd ("null_blk: allow zero poll queues") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20211224010831.1521805-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c5eafd79 |
|
10-Dec-2021 |
Jens Axboe <axboe@kernel.dk> |
null_blk: cast command status to integer kernel test robot reports that sparse now triggers a warning on null_blk: >> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int ioerror @@ got restricted blk_status_t [usertype] error @@ drivers/block/null_blk/main.c:1577:55: sparse: expected int ioerror drivers/block/null_blk/main.c:1577:55: sparse: got restricted blk_status_t [usertype] error because blk_mq_add_to_batch() takes an integer instead of a blk_status_t. Just cast this to an integer to silence it, null_blk is the odd one out here since the command status is the "right" type. If we change the function type, then we'll have do that for other callers too (existing and future ones). Fixes: 2385ebf38f94 ("block: null_blk: batched complete poll requests") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2385ebf3 |
|
03-Dec-2021 |
Ming Lei <ming.lei@redhat.com> |
block: null_blk: batched complete poll requests Complete poll requests via blk_mq_add_to_batch() and blk_mq_end_request_batch(), so that we can cover batched complete code path by running null_blk test. Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0' in my test. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2bfdbe8b |
|
02-Dec-2021 |
Ming Lei <ming.lei@redhat.com> |
null_blk: allow zero poll queues There isn't any reason to not allow zero poll queues from user viewpoint. Also sometimes we need to compare io poll between poll mode and irq mode, so not allowing poll queues is bad. Fixes: 15dfc662ef31 ("null_blk: Fix handling of submit_queues and poll_queues attributes") Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1ebe2e5f |
|
22-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
block: remove GENHD_FL_EXT_DEVT All modern drivers can support extra partitions using the extended dev_t. In fact except for the ioctl method drivers never even see partitions in normal operation. So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all block devices that do support partitions, and require those that do not support partitions to explicit disallow them using GENHD_FL_NO_PART. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
94b49c3d |
|
22-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
null_blk: don't suppress partitioning information This manually reverts commit 27290b469051 ("null_blk: suppress invalid partition info"). The message in that commit log can't appearch as the flag is never checked during probing, and there is no good reason to treat null_blk special in /proc/partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
15dfc662 |
|
29-Oct-2021 |
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> |
null_blk: Fix handling of submit_queues and poll_queues attributes Commit 0a593fbbc245 ("null_blk: poll queue support") introduced the poll queue feature to null_blk. After this change, null_blk device has both submit queues and poll queues, and null_map_queues() callback maps the both queues for corresponding hardware contexts. The commit also added the device configuration attribute 'poll_queues' in same manner as the existing attribute 'submit_queues'. These attributes allow to modify the numbers of queues. However, when the new values are stored to these attributes, the values are just handled only for the corresponding queue. When number of submit_queue is updated, number of poll_queue is not counted, or vice versa. This caused inconsistent number of queues and queue mapping and resulted in null-ptr-dereference. This failure was observed in blktests block/029 and block/030. To avoid the inconsistency, fix the attribute updates to care both submit_queues and poll_queues. Introduce the helper function nullb_update_nr_hw_queues() to handle stores to the both two attributes. Add poll_queues field to the struct nullb_device to track the number in same manner as submit_queues. Add two more fields prev_submit_queues and prev_poll_queues to keep the previous values before change. In case the block layer failed to update the nr_hw_queues, refer the previous values in null_map_queues() to map queues in same manner as before change. Also add poll_queues value checks in nullb_update_nr_hw_queues() and null_validate_conf(). They ensure the poll_queues value of each device is within the range from 1 to module parameter value of poll_queues. Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20211029103926.845635-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
0a593fbb |
|
17-Apr-2021 |
Jens Axboe <axboe@kernel.dk> |
null_blk: poll queue support There's currently no way to experiment with polled IO with null_blk, which seems like an oversight. This patch adds support for polled IO. We keep a list of issued IOs on submit, and then process that list when mq_ops->poll() is invoked. A new parameter is added, poll_queues. It defaults to 1 like the submit queues, meaning we'll have 1 poll queue available. Fixes-by: Bart Van Assche <bvanassche@acm.org> Fixes-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/baca710d-0f2a-16e2-60bd-b105b854e0ae@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3e08773c |
|
12-Oct-2021 |
Christoph Hellwig <hch@lst.de> |
block: switch polling to be bio based Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
10e7123d |
|
18-Aug-2021 |
Luis Chamberlain <mcgrof@kernel.org> |
null_blk: add error handling support for add_disk() We never checked for errors on add_disk() as this function returned void. Now that this is fixed, use the shiny new error handling. The actual cleanup in case of error is already handled by the caller of null_gendisk_register(). Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210818144542.19305-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
018eca45 |
|
20-Jul-2021 |
Guoqing Jiang <jiangguoqing@kylinos.cn> |
block: move some macros to blkdev.h Move them (PAGE_SECTORS_SHIFT, PAGE_SECTORS and SECTOR_MASK) to the generic header file to remove redundancy. Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn> Link: https://lore.kernel.org/r/20210721025315.1729118-1-guoqing.jiang@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2f43dbf3 |
|
14-Jun-2021 |
Christoph Hellwig <hch@lst.de> |
null_blk: remove an unused variable assignment in null_add_dev Fix up the recent blk_alloc_disk conversion. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210614060231.3965278-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6759b1a2 |
|
02-Jun-2021 |
Christoph Hellwig <hch@lst.de> |
nullb: use blk_mq_alloc_disk Use blk_mq_alloc_disk and blk_cleanup_disk to simplify the gendisk and request_queue allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
90bf3e28 |
|
02-Jun-2021 |
Colin Ian King <colin.king@canonical.com> |
null_blk: Fix null pointer dereference on nullb->disk on blk_cleanup_disk call The error handling on a nullb->disk allocation currently jumps to out_cleanup_disk that calls blk_cleanup_disk with a null pointer causing a null pointer dereference issue. Fix this by jumping to out_cleanup_tags instead. Addresses-Coverity: ("Dereference after null check") Fixes: 132226b301b5 ("null_blk: convert to blk_alloc_disk/blk_cleanup_disk") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210602100659.11058-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
132226b3 |
|
20-May-2021 |
Christoph Hellwig <hch@lst.de> |
null_blk: convert to blk_alloc_disk/blk_cleanup_disk Convert the null_blk driver to use the blk_alloc_disk and blk_cleanup_disk helpers to simplify gendisk and request_queue allocation. Note that the blk-mq mode is left with its own allocations scheme, to be handled later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://lore.kernel.org/r/20210521055116.1053587-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
cee1b215 |
|
12-Apr-2021 |
Max Gurtovoy <mgurtovoy@nvidia.com> |
null_blk: add option for managing virtual boundary This will enable changing the virtual boundary of null blk devices. For now, null blk devices didn't have any restriction on the scatter/gather elements received from the block layer. Add a module parameter and a configfs option that will control the virtual boundary. This will enable testing the efficiency of the block layer bounce buffer in case a suitable application will send discontiguous IO to the given device. Initial testing with patched FIO showed the following results (64 jobs, 128 iodepth, 1 nullb device): IO size READ (virt=false) READ (virt=true) Write (virt=false) Write (virt=true) ---------- ------------------- ----------------- ------------------- ------------------- 1k 10.7M 8482k 10.8M 8471k 2k 10.4M 8266k 10.4M 8271k 4k 10.4M 8274k 10.3M 8226k 8k 10.2M 8131k 9800k 7933k 16k 9567k 7764k 8081k 6828k 32k 8865k 7309k 5570k 5153k 64k 7695k 6586k 2682k 2617k 128k 5346k 5489k 1320k 1296k Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Link: https://lore.kernel.org/r/20210412095523.278632-1-mgurtovoy@nvidia.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
de3510e5 |
|
31-Mar-2021 |
Damien Le Moal <damien.lemoal@wdc.com> |
null_blk: fix command timeout completion handling Memory backed or zoned null block devices may generate actual request timeout errors due to the submission path being blocked on memory allocation or zone locking. Unlike fake timeouts or injected timeouts, the request submission path will call blk_mq_complete_request() or blk_mq_end_request() for these real timeout errors, causing a double completion and use after free situation as the block layer timeout handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in blk_mq_check_expired(). This problem often triggers a NULL pointer dereference such as: BUG: kernel NULL pointer dereference, address: 0000000000000050 RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20 ... Call Trace: dd_finish_request+0x56/0x80 blk_mq_free_request+0x37/0x130 null_handle_cmd+0xbf/0x250 [null_blk] ? null_queue_rq+0x67/0xd0 [null_blk] blk_mq_dispatch_rq_list+0x122/0x850 __blk_mq_do_dispatch_sched+0xbb/0x2c0 __blk_mq_sched_dispatch_requests+0x13d/0x190 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x90 process_one_work+0x26c/0x580 worker_thread+0x55/0x3c0 ? process_one_work+0x580/0x580 kthread+0x134/0x150 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x1f/0x30 This problem very often triggers when running the full btrfs xfstests on a memory-backed zoned null block device in a VM with limited amount of memory. Avoid this by executing blk_mq_complete_request() in null_timeout_rq() only for commands that are marked for a fake timeout completion using the fake_timeout boolean in struct null_cmd. For timeout errors injected through debugfs, the timeout handler will execute blk_mq_complete_request()i as before. This is safe as the submission path does not execute complete requests in this case. In null_timeout_rq(), also make sure to set the command error field to BLK_STS_TIMEOUT and to propagate this error through to the request completion. Reported-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Reviewed-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210331225244.126426-1-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
309dca30 |
|
24-Jan-2021 |
Christoph Hellwig <hch@lst.de> |
block: store a block_device pointer in struct bio Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
eebf34a8 |
|
19-Nov-2020 |
Damien Le Moal <damien.lemoal@wdc.com> |
null_blk: Move driver into its own directory Move null_blk driver code into the new sub-directory drivers/block/null_blk. Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|