#
522d7352 |
|
22-Feb-2024 |
Damien Le Moal <dlemoal@kernel.org> |
block: Do not include rbtree.h in blk-zoned.c The block zone code does not use RB-tree. So remove the include of linux/rbtree.h as it is not needed. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240222131724.1803520-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
71f4ecdb |
|
29-Jan-2024 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
block: remove gfp_flags from blkdev_zone_mgmt Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can drop the gfp_mask parameter from blkdev_zone_mgmt() as well as blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated(). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
587371ed |
|
07-Jan-2024 |
Damien Le Moal <dlemoal@kernel.org> |
block: Treat sequential write preferred zone type as invalid With the removal of the support for host-aware zoned devices, blk_revalidate_zone_cb() should never see the zone type BLK_ZONE_TYPE_SEQWRITE_PREF (sequential write preffered zones). Treat this zone type as being invalid. Fixes: 7437bb73f087 ("block: remove support for the host aware zone model") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240107072212.1071080-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4e33b071 |
|
28-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
block: remove disk_clear_zoned disk_clear_zoned is unused now that the last warts of the host-aware model support in sd are gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20231228075141.362560-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5165799f |
|
20-Dec-2023 |
Jens Axboe <axboe@kernel.dk> |
block: export disk_clear_zoned() A previous commit split disk_set_zoned(..., bool) into not taking an argument for whether to set or clear, and instead added disk_clear_zoned() as the counterpart. However, that commit neglected to export the new symbol, causing failures for modular drivers that used it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: d73e93b4dfab ("block: simplify disk_set_zoned") Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d73e93b4 |
|
17-Dec-2023 |
Christoph Hellwig <hch@lst.de> |
block: simplify disk_set_zoned Only use disk_set_zoned to actually enable zoned device support. For clearing it, call disk_clear_zoned, which is renamed from disk_clear_zone_settings and now directly clears the zoned flag as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
03e51c4a |
|
02-Jul-2023 |
Damien Le Moal <dlemoal@kernel.org> |
scsi: block: Improve checks in blk_revalidate_disk_zones() blk_revalidate_disk_zones() implements checks of the zones of a zoned block device, verifying that the zone size is a power of 2 number of sectors, that all zones (except possibly the last one) have the same size and that zones cover the entire addressing space of the device. While these checks are appropriate to verify that well tested hardware devices have an adequate zone configurations, they lack in certain areas which may result in issues with emulated devices implemented with user drivers such as ublk or tcmu. Specifically, this function does not check if the device driver indicated support for the mandatory zone append writes, that is, if the device max_zone_append_sectors queue limit is set to a non-zero value. Additionally, invalid zones such as a zero length zone with a start sector equal to the device capacity will not be detected and result in out of bounds use of the zone bitmaps prepared with the callback function blk_revalidate_zone_cb(). Improve blk_revalidate_disk_zones() to address these inadequate checks, relying on the fact that all device drivers supporting zoned block devices must set the device zone size (chunk_sectors queue limit) and the max_zone_append_sectors queue limit before executing this function. The check for a non-zero max_zone_append_sectors value is done in blk_revalidate_disk_zones() before executing the zone report. The zone report callback function blk_revalidate_zone_cb() is also modified to add a check that a zone start is below the device capacity. The check that the zone size is a power of 2 number of sectors is moved to blk_revalidate_disk_zones() as the zone size is already known. Similarly, the number of zones of the device can be calculated in blk_revalidate_disk_zones() before executing the zone report. The kdoc comment for blk_revalidate_disk_zones() is also updated to mention that device drivers must set the device zone size and the max_zone_append_sectors queue limit before calling this function. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230703024812.76778-6-dlemoal@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
#
05bdb996 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: replace fmode_t with a block-specific type for block open flags The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5e4ea834 |
|
08-Jun-2023 |
Christoph Hellwig <hch@lst.de> |
block: remove unused fmode_t arguments from ioctl handlers A few ioctl handlers have fmode_t arguments that are entirely unused, remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20230608110258.189493-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
19821fee |
|
17-May-2023 |
Bart Van Assche <bvanassche@acm.org> |
block: Introduce blk_rq_is_seq_zoned_write() Introduce the function blk_rq_is_seq_zoned_write(). This function will be used in later patches to preserve the order of zoned writes that require write serialization. This patch includes an optimization: instead of using rq->q->disk->part0->bd_queue to check whether or not the queue is associated with a zoned block device, use rq->q->disk->queue. Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230517174230.897144-6-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4f51644c |
|
17-May-2023 |
Bart Van Assche <bvanassche@acm.org> |
block: Simplify blk_req_needs_zone_write_lock() Remove the blk_rq_is_passthrough() check because it is redundant: blk_req_needs_zone_write_lock() also calls bdev_op_is_zoned_write() and the latter function returns false for pass-through requests. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230517174230.897144-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9e0c7efa |
|
02-Feb-2023 |
Juhyung Park <qkrwngud825@gmail.com> |
block: remove more NULL checks after bdev_get_queue() bdev_get_queue() never returns NULL. Several commits [1][2] have been made before to remove such superfluous checks, but some still remained. For places where bdev_get_queue() is called solely for NULL checks, it is removed entirely. [1] commit ec9fd2a13d74 ("blk-lib: don't check bdev_get_queue() NULL check") [2] commit fea127b36c93 ("block: remove superfluous check for request queue in bdev_is_zoned()") Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e29b2100 |
|
10-Jan-2023 |
Pankaj Raghav <p.raghav@samsung.com> |
block: add a new helper bdev_{is_zone_start, offset_from_zone_start} Instead of open coding to check for zone start, add a helper to improve readability and store the logic in one place. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230110143635.77300-3-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8cafdb5a |
|
29-Sep-2022 |
Pankaj Raghav <p.raghav@samsung.com> |
block: adapt blk_mq_plug() to not plug for writes that require a zone lock The current implementation of blk_mq_plug() disables plugging for all operations that involves a transfer to the device as we just check if the last bit in op_is_write() function. Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and REQ_OP_WRITE_ZEROS as they might require a zone lock. Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220929074745.103073-2-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ff07a02e |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
treewide: Rename enum req_opf into enum req_op The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d86e716a |
|
06-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: move zone related fields to struct gendisk Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
375c140c |
|
06-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: use bdev based helpers in blkdev_zone_mgmt{,all} Use the bdev based helpers instead of the queue based ones to clean up the code a bit and prepare for storing all zone related fields in struct gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b623e347 |
|
06-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: replace blkdev_nr_zones with bdev_nr_zones Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5d400665 |
|
06-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a gendisk to blk_queue_free_zone_bitmaps Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
b3c72f81 |
|
06-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a gendisk to blk_queue_clear_zone_settings Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
edd1dbc8 |
|
06-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: use bdev_is_zoned instead of open coding it Use bdev_is_zoned in all places where a block_device is available instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
73bd66d9 |
|
09-Feb-2022 |
Christoph Hellwig <hch@lst.de> |
scsi: block: Remove REQ_OP_WRITE_SAME support No more users of REQ_OP_WRITE_SAME or drivers implementing it are left, so remove the infrastructure. [mkp: fold in and tweak sysfs reporting fix] Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
#
49add496 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a block_device and opf to bio_init Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
0a3140ea |
|
24-Jan-2022 |
Chaitanya Kulkarni <kch@nvidia.com> |
block: pass a block_device and opf to blk_next_bio All callers need to set the block_device and operation, so lift that into the common code. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
86399ea0 |
|
11-Nov-2021 |
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> |
block: Hold invalidate_lock in BLKRESETZONE ioctl When BLKRESETZONE ioctl and data read race, the data read leaves stale page cache. The commit e5113505904e ("block: Discard page cache of zone reset target range") added page cache truncation to avoid stale page cache after the ioctl. However, the stale page cache still can be read during the reset zone operation for the ioctl. To avoid the stale page cache completely, hold invalidate_lock of the block device file mapping. Fixes: e5113505904e ("block: Discard page cache of zone reset target range") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4d643b66 |
|
11-Aug-2021 |
Niklas Cassel <niklas.cassel@wdc.com> |
blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMIN A user space process should not need the CAP_SYS_ADMIN capability set in order to perform a BLKREPORTZONE ioctl. Getting the zone report is required in order to get the write pointer. Neither read() nor write() requires CAP_SYS_ADMIN, so it is reasonable that a user space process that can read/write from/to the device, also can get the write pointer. (Since e.g. writes have to be at the write pointer.) Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: stable@vger.kernel.org # v4.10+ Link: https://lore.kernel.org/r/20210811110505.29649-3-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ead3b768 |
|
11-Aug-2021 |
Niklas Cassel <niklas.cassel@wdc.com> |
blk-zoned: allow zone management send operations without CAP_SYS_ADMIN Zone management send operations (BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE) should be allowed under the same permissions as write(). (write() does not require CAP_SYS_ADMIN). Additionally, other ioctls like BLKSECDISCARD and BLKZEROOUT only check if the fd was successfully opened with FMODE_WRITE. (They do not require CAP_SYS_ADMIN). Currently, zone management send operations require both CAP_SYS_ADMIN and that the fd was successfully opened with FMODE_WRITE. Remove the CAP_SYS_ADMIN requirement, so that zone management send operations match the access control requirement of write(), BLKSECDISCARD and BLKZEROOUT. Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: stable@vger.kernel.org # v4.10+ Link: https://lore.kernel.org/r/20210811110505.29649-2-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1ee533ec |
|
25-May-2021 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: improve handling of all zones reset operation SCSI, ZNS and null_blk zoned devices support resetting all zones using a single command (REQ_OP_ZONE_RESET_ALL), as indicated using the device request queue flag QUEUE_FLAG_ZONE_RESETALL. This flag is not set for device mapper targets creating zoned devices. In this case, a user request for resetting all zones of a device is processed in blkdev_zone_mgmt() by issuing a REQ_OP_ZONE_RESET operation for each zone of the device. This leads to different behaviors of the BLKRESETZONE ioctl() depending on the target device support for the reset all operation. E.g. blkzone reset /dev/sdX will reset all zones of a SCSI device using a single command that will ignore conventional, read-only or offline zones. But a dm-linear device including conventional, read-only or offline zones cannot be reset in the same manner as some of the single zone reset operations issued by blkdev_zone_mgmt() will fail. E.g.: blkzone reset /dev/dm-Y blkzone: /dev/dm-0: BLKRESETZONE ioctl failed: Remote I/O error To simplify applications and tools development, unify the behavior of the all-zone reset operation by modifying blkdev_zone_mgmt() to not issue a zone reset operation for conventional, read-only and offline zones, thus mimicking what an actual reset-all device command does on a device supporting REQ_OP_ZONE_RESET_ALL. This emulation is done using the new function blkdev_zone_reset_all_emulated(). The zones needing a reset are identified using a bitmap that is initialized using a zone report. Since empty zones do not need a reset, also ignore these zones. The function blkdev_zone_reset_all() is introduced for block devices natively supporting reset all operations. blkdev_zone_mgmt() is modified to call either function to execute an all zone reset request. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> [hch: split into multiple functions] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
540ad3f3 |
|
06-Apr-2021 |
Bart Van Assche <bvanassche@acm.org> |
blk-zoned: Remove the definition of blk_zone_start() Commit e76239a3748c ("block: add a report_zones method") removed the last blk_zone_start() call. Hence also remove the definition of this function. Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210406200820.15180-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e5113505 |
|
11-Mar-2021 |
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> |
block: Discard page cache of zone reset target range When zone reset ioctl and data read race for a same zone on zoned block devices, the data read leaves stale page cache even though the zone reset ioctl zero clears all the zone data on the device. To avoid non-zero data read from the stale page cache after zone reset, discard page cache of reset target zones in blkdev_zone_mgmt_ioctl(). Introduce the helper function blkdev_truncate_zone_range() to discard the page cache. Ensure the page cache discarded by calling the helper function before and after zone reset in same manner as fallocate does. This patch can be applied back to the stable kernel version v5.10.y. Rework is needed for older stable kernels. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Cc: <stable@vger.kernel.org> # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210311072546.678999-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
faa44c69 |
|
10-Mar-2021 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Fix REQ_OP_ZONE_RESET_ALL handling Similarly to a single zone reset operation (REQ_OP_ZONE_RESET), execute REQ_OP_ZONE_RESET_ALL operations with REQ_SYNC set. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
508aebb8 |
|
27-Jan-2021 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: introduce blk_queue_clear_zone_settings() Introduce the internal function blk_queue_clear_zone_settings() to cleanup all limits and resources related to zoned block devices. This new function is called from blk_queue_set_zoned() when a disk zoned model is set to BLK_ZONED_NONE. This particular case can happens when a partition is created on a host-aware scsi disk. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2afdeb23 |
|
11-Nov-2020 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Improve blk_revalidate_disk_zones() checks Improves the checks on the zones of a zoned block device done in blk_revalidate_disk_zones() by making sure that the device report_zones method did report at least one zone and that the zones reported exactly cover the entire disk capacity, that is, that there are no missing zones at the end of the disk sector range. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1a1206dc |
|
30-Jul-2020 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
block: don't do revalidate zones on invalid devices When we loose a device for whatever reason while (re)scanning zones, we trip over a NULL pointer in blk_revalidate_zone_cb, like in the following log: sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB) sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] Sense Key : 0xb [current] sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6 sda: failed to revalidate zones sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B) sda: detected capacity change from 14000519643136 to 0 ================================================================== BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550 Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58 CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Workqueue: events_unbound async_run_entry_fn Call Trace: dump_stack+0x7d/0xb0 ? blk_revalidate_zone_cb+0x1b7/0x550 kasan_report.cold+0x5/0x37 ? blk_revalidate_zone_cb+0x1b7/0x550 check_memory_region+0x145/0x1a0 blk_revalidate_zone_cb+0x1b7/0x550 sd_zbc_parse_report+0x1f1/0x370 ? blk_req_zone_write_trylock+0x200/0x200 ? sectors_to_logical+0x60/0x60 ? blk_req_zone_write_trylock+0x200/0x200 ? blk_req_zone_write_trylock+0x200/0x200 sd_zbc_report_zones+0x3c4/0x5e0 ? sd_dif_config_host+0x500/0x500 blk_revalidate_disk_zones+0x231/0x44d ? _raw_write_lock_irqsave+0xb0/0xb0 ? blk_queue_free_zone_bitmaps+0xd0/0xd0 sd_zbc_read_zones+0x8cf/0x11a0 sd_revalidate_disk+0x305c/0x64e0 ? __device_add_disk+0x776/0xf20 ? read_capacity_16.part.0+0x1080/0x1080 ? blk_alloc_devt+0x250/0x250 ? create_object.isra.0+0x595/0xa20 ? kasan_unpoison_shadow+0x33/0x40 sd_probe+0x8dc/0xcd2 really_probe+0x20e/0xaf0 __driver_attach_async_helper+0x249/0x2d0 async_run_entry_fn+0xbe/0x560 process_one_work+0x764/0x1290 ? _raw_read_unlock_irqrestore+0x30/0x30 worker_thread+0x598/0x12f0 ? __kthread_parkme+0xc6/0x1b0 ? schedule+0xed/0x2c0 ? process_one_work+0x1290/0x1290 kthread+0x36b/0x440 ? kthread_create_worker_on_cpu+0xa0/0xa0 ret_from_fork+0x22/0x30 ================================================================== When the device is already gone we end up with the following scenario: The device's capacity is 0 and thus the number of zones will be 0 as well. When allocating the bitmap for the conventional zones, we then trip over a NULL pointer. So if we encounter a zoned block device with a 0 capacity, don't dare to revalidate the zones sizes. Fixes: 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
82394db7 |
|
29-Jun-2020 |
Matias Bjørling <matias.bjorling@wdc.com> |
block: add capacity field to zone descriptors In the zoned storage model, the sectors within a zone are typically all writeable. With the introduction of the Zoned Namespace (ZNS) Command Set in the NVM Express organization, the model was extended to have a specific writeable capacity. Extend the zone descriptor data structure with a zone capacity field to indicate to the user how many sectors in a zone are writeable. Introduce backward compatibility in the zone report ioctl by extending the zone report header data structure with a flags field to indicate if the capacity field is available. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
|
#
e732671a |
|
12-May-2020 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Modify revalidate zones Modify the interface of blk_revalidate_disk_zones() to add an optional driver callback function that a driver can use to extend processing done during zone revalidation. The callback, if defined, is executed with the device request queue frozen, after all zones have been inspected. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1392d370 |
|
12-May-2020 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
block: introduce blk_req_zone_write_trylock Introduce blk_req_zone_write_trylock(), which either grabs the write-lock for a sequential zone or returns false, if the zone is already locked. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
02694e86 |
|
25-Mar-2020 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
block: add a zone condition debug helper Add a helper to stringify the zone conditions. We use this helper in the next patch to track zone conditions in tracepoints. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
11bde986 |
|
12-Feb-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
block, zoned: fix integer overflow with BLKRESETZONE et al Check for overflow in addition before checking for end-of-block-device. Steps to reproduce: #define _GNU_SOURCE 1 #include <sys/ioctl.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> typedef unsigned long long __u64; struct blk_zone_range { __u64 sector; __u64 nr_sectors; }; #define BLKRESETZONE _IOW(0x12, 131, struct blk_zone_range) int main(void) { int fd = open("/dev/nullb0", O_RDWR|O_DIRECT); struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL}; ioctl(fd, BLKRESETZONE, &zr); return 0; } BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0 Write of size 8 at addr 0000000000000040 by task a.out/1590 CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c02bfa #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 Call Trace: dump_stack+0x76/0xa0 __kasan_report.cold+0x5/0x3e kasan_report+0xe/0x20 submit_bio_wait+0x74/0xe0 blkdev_zone_mgmt+0x26f/0x2a0 blkdev_zone_mgmt_ioctl+0x14b/0x1b0 blkdev_ioctl+0xb28/0xe60 block_ioctl+0x69/0x80 ksys_ioctl+0x3af/0xa50 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
8e42d239 |
|
07-Jan-2020 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
block: mark zone-mgmt bios with REQ_SYNC In the current implementation, final zone-mgmt request is issued with submit_bio_wait() which marks the bio REQ_SYNC. This is needed since immediate action is expected for zone-mgmt requests as these are blocking operations. This also bypasses the scheduler in the blk_mq_make_request() and dispatches the request directly into the hw ctx. This patch marks all the chained bios REQ_SYNC so that we can have above-mentioned behavior for non-final bios also. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6c6b3549 |
|
03-Dec-2019 |
Christoph Hellwig <hch@lst.de> |
block: set the zone size in blk_revalidate_disk_zones atomically The current zone revalidation code has a major problem in that it doesn't update the zone size and q->nr_zones atomically, leading to a short window where an out of bounds access to the zone arrays is possible. To fix this move the setting of the zone size into the crticial sections blk_revalidate_disk_zones so that it gets updated together with the zone bitmaps and q->nr_zones. This also slightly simplifies the caller as it deducts the zone size from the report_zones. This change also allows to check for a power of two zone size in generic code. Reported-by: Hans Holmberg <hans@owltronix.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ae58954d |
|
03-Dec-2019 |
Christoph Hellwig <hch@lst.de> |
block: don't handle bio based drivers in blk_revalidate_disk_zones bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e94f5819 |
|
03-Dec-2019 |
Christoph Hellwig <hch@lst.de> |
block: allocate the zone bitmaps lazily Allocate the conventional zone bitmap and the sequential zone locking bitmap only when we find a zone of the respective type. This avoids wasting memory on the conventional zone bitmap for devices that only have sequential zones, and will also prepare for other future changes. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f216fdd7 |
|
03-Dec-2019 |
Christoph Hellwig <hch@lst.de> |
block: replace seq_zones_bitmap with conv_zones_bitmap Invert the meaning of seq_zones_bitmap by keeping a bitmap of conventional zones. This allows not having a bitmap for devices that do not have conventional zones. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9b38bb4b |
|
03-Dec-2019 |
Christoph Hellwig <hch@lst.de> |
block: simplify blkdev_nr_zones Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
bb556282 |
|
03-Dec-2019 |
Christoph Hellwig <hch@lst.de> |
block: remove the empty line at the end of blk-zoned.c Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d4100351 |
|
10-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
block: rework zone reporting Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
5eac3eb3 |
|
10-Nov-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Remove partition support for zoned block devices No known partitioning tool supports zoned block devices, especially the host managed flavor with strong sequential write constraints. Furthermore, there are also no known user nor use cases for partitioned zoned block devices. This patch removes partition device creation for zoned block devices, which allows simplifying the processing of zone commands for zoned block devices. A warning is added if a partition table is found on the device. For report zones operations no zone sector information remapping is necessary anymore, simplifying the code. Of note is that remapping of zone reports for DM targets is still necessary as done by dm_remap_zone_report(). Similarly, remaping of a zone reset bio is not necessary anymore. Testing for the applicability of the zone reset all request also becomes simpler and only needs to check that the number of sectors of the requested zone range is equal to the disk capacity. Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ceeb373a |
|
10-Nov-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Simplify report zones execution All kernel users of blkdev_report_zones() as well as applications use through ioctl(BLKZONEREPORT) expect to potentially get less zone descriptors than requested. As such, the use of the internal report zones command execution loop implemented by blk_report_zones() is not necessary and can even be harmful to performance by causing the execution of inefficient small zones report command to service the reminder of a requested zone array. This patch removes blk_report_zones(), simplifying the code. Also remove a now incorrect comment in dm_blk_report_zones(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Javier Gonzalez <javier@javigon.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c98c3d09 |
|
10-Nov-2019 |
Christoph Hellwig <hch@lst.de> |
block: cleanup the !zoned case in blk_revalidate_disk_zones blk_revalidate_disk_zones is never called for non-zoned devices. Just return early and warn instead of trying to handle this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d9dd7308 |
|
10-Nov-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Enhance blk_revalidate_disk_zones() For ZBC and ZAC zoned devices, the scsi driver revalidation processing implemented by sd_revalidate_disk() includes a call to sd_zbc_read_zones() which executes a full disk zone report used to check that all zones of the disk are the same size. This processing is followed by a call to blk_revalidate_disk_zones(), used to initialize the device request queue zone bitmaps (zone type and zone write lock bitmaps). To do so, blk_revalidate_disk_zones() also executes a full device zone report to obtain zone types. As a result, the entire zoned block device revalidation process includes two full device zone report. By moving the zone size checks into blk_revalidate_disk_zones(), this process can be optimized to a single full device zone report, leading to shorter device scan and revalidation times. This patch implements this optimization, reducing the original full device zone report implemented in sd_zbc_check_zones() to a single, small, report zones command execution to obtain the size of the first zone of the device. Checks whether all zones of the device are the same size as the first zone size are moved to the generic blk_check_zone() function called from blk_revalidate_disk_zones(). This optimization also has the following benefits: 1) fewer memory allocations in the scsi layer during disk revalidation as the potentailly large buffer for zone report execution is not needed. 2) Implement zone checks in a generic manner, reducing the burden on device driver which only need to obtain the zone size and check that this size is a power of 2 number of LBAs. Any new type of zoned block device will benefit from this. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e876df1f |
|
27-Oct-2019 |
Ajay Joshi <ajay.joshi@wdc.com> |
block: add zone open, close and finish ioctl support Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE to allow applications to control the condition of zones on a zoned block device through the execution of the REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6c1b1da5 |
|
27-Oct-2019 |
Ajay Joshi <ajay.joshi@wdc.com> |
block: add zone open, close and finish operations Zoned block devices (ZBC and ZAC devices) allow an explicit control over the condition (state) of zones. The operations allowed are: * Open a zone: Transition to open condition to indicate that a zone will actively be written * Close a zone: Transition to closed condition to release the drive resources used for writing to a zone * Finish a zone: Transition an open or closed zone to the full condition to prevent write operations To enable this control for in-kernel zoned block device users, define the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH as well as the generic function blkdev_zone_mgmt() for submitting these operations on a range of zones. This results in blkdev_reset_zones() removal and replacement with this new zone magement function. Users of blkdev_reset_zones() (f2fs and dm-zoned) are updated accordingly. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c7a1d926 |
|
27-Oct-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Simplify REQ_OP_ZONE_RESET_ALL handling There is no need for the function __blkdev_reset_all_zones() as REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones() bio loop with an early break from the loop. This patch removes this function and modifies blkdev_reset_zones(), simplifying the code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a84324d2 |
|
27-Oct-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Remove REQ_OP_ZONE_RESET plugging REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests do not have a size and are never sequential due to the zone start sector position required for their execution. As a result, there is no point in using a plug around blkdev_reset_zones() bio issuing loop. This patch removes this unnecessary plugging. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6e33dbf2 |
|
01-Aug-2019 |
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> |
blk-zoned: implement REQ_OP_ZONE_RESET_ALL This implements REQ_OP_ZONE_RESET_ALL as a special case of the block device zone reset operations where we just simply issue bio with the newly introduced req op. We issue this req op when the number of sectors is equal to the device's partition's number of sectors and device has no partitions. We also add support so that blk_op_str() can print the new reset-all zone operation. This patch also adds a generic make request check for newly introduced REQ_OP_ZONE_RESET_ALL req_opf. We simply return error when queue is zoned and reset-all flag is not set for REQ_OP_ZONE_RESET_ALL. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
26202928 |
|
30-Jun-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Limit zone array allocation size Limit the size of the struct blk_zone array used in blk_revalidate_disk_zones() to avoid memory allocation failures leading to disk revalidation failure. Also further reduce the likelyhood of such failures by using kvcalloc() (that is vmalloc()) instead of allocating contiguous pages with alloc_pages(). Fixes: 515ce6061312 ("scsi: sd_zbc: Fix sd_zbc_report_zones() buffer allocation") Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
bd976e52 |
|
30-Jun-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Kill gfp_t argument of blkdev_report_zones() Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In preparation of using vmalloc() for large report buffer and zone array allocations used by this function, remove its "gfp_t gfp_mask" argument and rely on the caller context to use memalloc_noio_save/restore() where necessary (block layer zone revalidation and dm-zoned I/O error path). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
113ab72e |
|
09-Jul-2019 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Fix potential overflow in blk_report_zones() For large values of the number of zones reported and/or large zone sizes, the sector increment calculated with blk_queue_zone_sectors(q) * n in blk_report_zones() loop can overflow the unsigned int type used for the calculation as both "n" and blk_queue_zone_sectors() value are unsigned int. E.g. for a device with 256 MB zones (524288 sectors), overflow happens with 8192 or more zones reported. Changing the return type of blk_queue_zone_sectors() to sector_t, fixes this problem and avoids overflow problem for all other callers of this helper too. The same change is also applied to the bdev_zone_sectors() helper. Fixes: e76239a3748c ("block: add a report_zones method") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3dcf60bc |
|
30-Apr-2019 |
Christoph Hellwig <hch@lst.de> |
block: add SPDX tags to block layer files missing licensing information Various block layer files do not have any licensing information at all. Add SPDX tags for the default kernel GPLv2 license to those. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
927b6b2d |
|
11-Dec-2018 |
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> |
block: Fix null_blk_zoned creation failure with small number of zones null_blk_zoned creation fails if the number of zones specified is equal to or is smaller than 64 due to a memory allocation failure in blk_alloc_zones(). With such a small number of zones, the required memory size for all zones descriptors fits in a single page, and the page order for alloc_pages_node() is zero. Allow this value in blk_alloc_zones() for the allocation to succeed. Fixes: bf5054569653 "block: Introduce blk_revalidate_disk_zones()" Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
344e9ffc |
|
15-Nov-2018 |
Jens Axboe <axboe@kernel.dk> |
block: add queue_is_mq() helper Various spots check for q->mq_ops being non-NULL, but provide a helper to do this instead. Where the ->mq_ops != NULL check is redundant, remove it. Since mq == rq-based now that legacy is gone, get rid of the queue_is_rq_based() and just use queue_is_mq() everywhere. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
bf505456 |
|
12-Oct-2018 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Introduce blk_revalidate_disk_zones() Drivers exposing zoned block devices have to initialize and maintain correctness (i.e. revalidate) of the device zone bitmaps attached to the device request queue (seq_zones_bitmap and seq_zones_wlock). To simplify coding this, introduce a generic helper function blk_revalidate_disk_zones() suitable for most (and likely all) cases. This new function always update the seq_zones_bitmap and seq_zones_wlock bitmaps as well as the queue nr_zones field when called for a disk using a request based queue. For a disk using a BIO based queue, only the number of zones is updated since these queues do not have schedulers and so do not need the zone bitmaps. With this change, the zone bitmap initialization code in sd_zbc.c can be replaced with a call to this function in sd_zbc_read_zones(), which is called from the disk revalidate block operation method. A call to blk_revalidate_disk_zones() is also added to the null_blk driver for devices created with the zoned mode enabled. Finally, to ensure that zoned devices created with dm-linear or dm-flakey expose the correct number of zones through sysfs, a call to blk_revalidate_disk_zones() is added to dm_table_set_restrictions(). The zone bitmaps allocated and initialized with blk_revalidate_disk_zones() are freed automatically from __blk_release_queue() using the block internal function blk_queue_free_zone_bitmaps(). Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
e76239a3 |
|
12-Oct-2018 |
Christoph Hellwig <hch@lst.de> |
block: add a report_zones method Dispatching a report zones command through the request queue is a major pain due to the command reply payload rewriting necessary. Given that blkdev_report_zones() is executing everything synchronously, implement report zones as a block device file operation instead, allowing major simplification of the code in many places. sd, null-blk, dm-linear and dm-flakey being the only block device drivers supporting exposing zoned block devices, these drivers are modified to provide the device side implementation of the report_zones() block device file operation. For device mappers, a new report_zones() target type operation is defined so that the upper block layer calls blkdev_report_zones() can be propagated down to the underlying devices of the dm targets. Implementation for this new operation is added to the dm-linear and dm-flakey targets. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Changed method block_device argument to gendisk * Various bug fixes and improvements * Added support for null_blk, dm-linear and dm-flakey. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a2d6b3a2 |
|
12-Oct-2018 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Improve zone reset execution There is no need to synchronously execute all REQ_OP_ZONE_RESET BIOs necessary to reset a range of zones. Similarly to what is done for discard BIOs in blk-lib.c, all zone reset BIOs can be chained and executed asynchronously and a synchronous call done only for the last BIO of the chain. Modify blkdev_reset_zones() to operate similarly to blkdev_issue_discard() using the next_bio() helper for chaining BIOs. To avoid code duplication of that function in blk_zoned.c, rename next_bio() into blk_next_bio() and declare it as a block internal function in blk.h. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
2e85fbaf |
|
12-Oct-2018 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Limit allocation of zone descriptors for report zones There is no point in allocating more zone descriptors than the number of zones a block device has for doing a zone report. Avoid doing that in blkdev_report_zones_ioctl() by limiting the number of zone decriptors allocated internally to process the user request. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a91e1380 |
|
12-Oct-2018 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Introduce blkdev_nr_zones() helper Introduce the blkdev_nr_zones() helper function to get the total number of zones of a zoned block device. This number is always 0 for a regular block device (q->limits.zoned == BLK_ZONED_NONE case). Replace hard-coded number of zones calculation in dmz_get_zoned_device() with a call to this helper. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f441108f |
|
15-Jun-2018 |
Bart Van Assche <bvanassche@acm.org> |
block: Remove a superfluous cast from blkdev_report_zones() No cast is necessary when assigning a non-void pointer to a void pointer. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
344476e1 |
|
12-Jun-2018 |
Kees Cook <keescook@chromium.org> |
treewide: kvmalloc() -> kvmalloc_array() The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This patch replaces cases of: kvmalloc(a * b, gfp) with: kvmalloc_array(a * b, gfp) as well as handling cases of: kvmalloc(a * b * c, gfp) with: kvmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kvmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kvmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kvmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kvmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kvmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kvmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kvmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kvmalloc( - sizeof(char) * COUNT + COUNT , ...) | kvmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kvmalloc + kvmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kvmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kvmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kvmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kvmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kvmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kvmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kvmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kvmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kvmalloc(C1 * C2 * C3, ...) | kvmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kvmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kvmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kvmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kvmalloc(sizeof(THING) * C2, ...) | kvmalloc(sizeof(TYPE) * C2, ...) | kvmalloc(C1 * C2 * C3, ...) | kvmalloc(C1 * C2, ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kvmalloc + kvmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kvmalloc + kvmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kvmalloc + kvmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
327ea4ad |
|
22-May-2018 |
Bart Van Assche <bvanassche@acm.org> |
blkdev_report_zones_ioctl(): Use vmalloc() to allocate large buffers Avoid that complaints similar to the following appear in the kernel log if the number of zones is sufficiently large: fio: page allocation failure: order:9, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null) Call Trace: dump_stack+0x63/0x88 warn_alloc+0xf5/0x190 __alloc_pages_slowpath+0x8f0/0xb0d __alloc_pages_nodemask+0x242/0x260 alloc_pages_current+0x6a/0xb0 kmalloc_order+0x18/0x50 kmalloc_order_trace+0x26/0xb0 __kmalloc+0x20e/0x220 blkdev_report_zones_ioctl+0xa5/0x1a0 blkdev_ioctl+0x1ba/0x930 block_ioctl+0x41/0x50 do_vfs_ioctl+0xaa/0x610 SyS_ioctl+0x79/0x90 do_syscall_64+0x79/0x1b0 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Shaun Tancheff <shaun.tancheff@seagate.com> Cc: Damien Le Moal <damien.lemoal@hgst.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Hannes Reinecke <hare@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
56c4bddb |
|
08-Mar-2018 |
Bart Van Assche <bvanassche@acm.org> |
block: Suppress kernel-doc warnings triggered by blk-zoned.c Avoid that building with W=1 causes the kernel-doc tool to complain about undocumented function arguments for the blk-zoned.c source file. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6cc77e9c |
|
20-Dec-2017 |
Christoph Hellwig <hch@lst.de> |
block: introduce zoned block devices zone write locking Components relying only on the request_queue structure for accessing block devices (e.g. I/O schedulers) have a limited knowledged of the device characteristics. In particular, the device capacity cannot be easily discovered, which for a zoned block device also result in the inability to easily know the number of zones of the device (the zone size is indicated by the chunk_sectors field of the queue limits). Introduce the nr_zones field to the request_queue structure to simplify access to this information. Also, add the bitmap seq_zone_bitmap which indicates which zones of the device are sequential zones (write preferred or write required) and the bitmap seq_zones_wlock which indicates if a zone is write locked, that is, if a write request targeting a zone was dispatched to the device. These fields are initialized by the low level block device driver (sd.c for ZBC/ZAC disks). They are not initialized by stacking drivers (device mappers) handling zoned block devices (e.g. dm-linear). Using this, I/O schedulers can introduce zone write locking to control request dispatching to a zoned block device and avoid write request reordering by limiting to at most a single write request per zone outside of the scheduler at any time. Based on previous patches from Damien Le Moal. Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Fixed comments and identation in blkdev.h * Changed helper functions * Fixed this commit message Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
74d46992 |
|
23-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
block: replace bi_bdev with a gendisk pointer and partitions index This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f99e8648 |
|
12-Jan-2017 |
Damien Le Moal <damien.lemoal@wdc.com> |
block: Rename blk_queue_zone_size and bdev_zone_size All block device data fields and functions returning a number of 512B sectors are by convention named xxx_sectors while names in the form xxx_size are generally used for a number of bytes. The blk_queue_zone_size and bdev_zone_size functions were not following this convention so rename them. No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Collapsed the two patches, they were nonsensically split and broke bisection. Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
3c4da758 |
|
21-Oct-2016 |
Arnd Bergmann <arnd@arndb.de> |
block: zoned: fix harmless maybe-uninitialized warning The blkdev_report_zones produces a harmless warning when -Wmaybe-uninitialized is set, after gcc gets a little confused about the multiple 'goto' here: block/blk-zoned.c: In function 'blkdev_report_zones': block/blk-zoned.c:188:13: error: 'nz' may be used uninitialized in this function [-Werror=maybe-uninitialized] Moving the assignment to nr_zones makes this a little simpler while also avoiding the warning reliably. I'm removing the extraneous initialization of 'int ret' in the same patch, as that is semi-related and could cause an uninitialized use of that variable to not produce a warning. Fixes: 6a0cb1bc106f ("block: Implement support for zoned block devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
3ed05a98 |
|
18-Oct-2016 |
Shaun Tancheff <shaun@tancheff.com> |
blk-zoned: implement ioctls Adds the new BLKREPORTZONE and BLKRESETZONE ioctls for respectively obtaining the zone configuration of a zoned block device and resetting the write pointer of sequential zones of a zoned block device. The BLKREPORTZONE ioctl maps directly to a single call of the function blkdev_report_zones. The zone information result is passed as an array of struct blk_zone identical to the structure used internally for processing the REQ_OP_ZONE_REPORT operation. The BLKRESETZONE ioctl maps to a single call of the blkdev_reset_zones function. Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
6a0cb1bc |
|
18-Oct-2016 |
Hannes Reinecke <hare@suse.de> |
block: Implement support for zoned block devices Implement zoned block device zone information reporting and reset. Zone information are reported as struct blk_zone. This implementation does not differentiate between host-aware and host-managed device models and is valid for both. Two functions are provided: blkdev_report_zones for discovering the zone configuration of a zoned block device, and blkdev_reset_zones for resetting the write pointer of sequential zones. The helper function blk_queue_zone_size and bdev_zone_size are also provided for, as the name suggest, obtaining the zone size (in 512B sectors) of the zones of the device. Signed-off-by: Hannes Reinecke <hare@suse.de> [Damien: * Removed the zone cache * Implement report zones operation based on earlier proposal by Shaun Tancheff <shaun.tancheff@seagate.com>] Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com> Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|