Cross Reference: /linux-master/drivers/md/dm-thin.c

History log of /linux-master/drivers/md/dm-thin.c
Revision	Date	Author	Comments
# fa34e589	07-Feb-2024	Mike Snitzer <snitzer@kernel.org>	dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 47c00dcd	15-Nov-2023	Mike Snitzer <snitzer@kernel.org>	dm thin: add braces around conditional code that spans lines Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# fa375646	16-Jun-2023	Mike Snitzer <snitzer@kernel.org>	dm thin: disable discards for thin-pool if no_discard_passdown Also rename disable_passdown_if_not_supported to disable_discard_passdown_if_not_supported. And fold passdown_enabled() into only caller. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# ef6953fb	31-May-2023	Mike Snitzer <snitzer@kernel.org>	dm thin: update .io_hints methods to not require handling discards last Removes assumptions about what might follow the discard setup code (previously the code would return early if discards not enabled). Makes it possible to add more capabilites to the end of each .io_hints method (which is the natural thing to do when adding new features). Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# c0a7a0ac	15-May-2023	Mike Snitzer <snitzer@kernel.org>	dm thin: remove return code variable in pool_map Always returns DM_MAPIO_REMAPPED so no need for variable. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 722d9082	13-Jun-2023	Mike Snitzer <snitzer@kernel.org>	dm thin: fix issue_discard to pass GFP_NOIO to __blkdev_issue_discard issue_discard() passes GFP_NOWAIT to __blkdev_issue_discard() despite its code assuming bio_alloc() always succeeds. Commit 3dba53a958a75 ("dm thin: use __blkdev_issue_discard for async discard support") clearly shows where things went bad: Before commit 3dba53a958a75, dm-thin.c's open-coded __blkdev_issue_discard_async() properly handled using GFP_NOWAIT. Unfortunately __blkdev_issue_discard() doesn't and it was missed during review. Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 05bdb996	08-Jun-2023	Christoph Hellwig <hch@lst.de>	block: replace fmode_t with a block-specific type for block open flags The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 3f8d3f54	25-Mar-2023	Mike Snitzer <snitzer@kernel.org>	dm bio prison v1: add dm_cell_key_has_valid_range Don't have bio_detain() BUG_ON if a dm_cell_key is beyond BIO_PRISON_MAX_RANGE or spans a boundary. Update dm-thin.c:build_key() to use dm_cell_key_has_valid_range() which will do this checking without using BUG_ON. Also update process_discard_bio() to check the discard bio that DM core passes in (having first imposed max_discard_granularity based splitting). dm_cell_key_has_valid_range() will merely WARN_ON_ONCE if it returns false because if it does: it is programmer error that should be caught with proper testing. So relax the BUG_ONs to be WARN_ON_ONCE. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# e2dd8aca	02-Mar-2023	Joe Thornber <ejt@redhat.com>	dm bio prison v1: improve concurrent IO performance Split the bio prison into multiple regions, with a separate rbtree and associated lock for each region. To get fast bio prison locking and not damage the performance of discards too much the bio-prison now stipulates that discards should not cross a BIO_PRISON_MAX_RANGE boundary. Because the range of a key (block_end - block_begin) must not exceed BIO_PRISON_MAX_RANGE: break_up_discard_bio() now ensures the data range reflected in PHYSICAL key doesn't exceed BIO_PRISON_MAX_RANGE. And splitting the thin target's discards (handled with VIRTUAL key) is achieved by updating dm-thin.c to set limits->max_discard_sectors in terms of BIO_PRISON_MAX_RANGE _and_ setting the thin and thin-pool targets' max_discard_granularity to true. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# bb46c561	22-Mar-2023	Joe Thornber <ejt@redhat.com>	dm thin: speed up cell_defer_no_holder() Reduce the time that a spinlock is held in cell_defer_no_holder(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 9bbf5fee	27-Feb-2023	Coly Li <colyli@suse.de>	dm thin: fix deadlock when swapping to thin device This is an already known issue that dm-thin volume cannot be used as swap, otherwise a deadlock may happen when dm-thin internal memory demand triggers swap I/O on the dm-thin volume itself. But thanks to commit a666e5c05e7c ("dm: fix deadlock when swapping to encrypted device"), the limit_swap_bios target flag can also be used for dm-thin to avoid the recursive I/O when it is used as swap. Fix is to simply set ti->limit_swap_bios to true in both pool_ctr() and thin_ctr(). In my test, I create a dm-thin volume /dev/vg/swap and use it as swap device. Then I run fio on another dm-thin volume /dev/vg/main and use large --blocksize to trigger swap I/O onto /dev/vg/swap. The following fio command line is used in my test, fio --name recursive-swap-io --lockmem 1 --iodepth 128 \ --ioengine libaio --filename /dev/vg/main --rw randrw \ --blocksize 1M --numjobs 32 --time_based --runtime=12h Without this fix, the whole system can be locked up within 15 seconds. With this fix, there is no any deadlock or hung task observed after 2 hours of running fio. Furthermore, if blocksize is changed from 1M to 128M, after around 30 seconds fio has no visible I/O, and the out-of-memory killer message shows up in kernel message. After around 20 minutes all fio processes are killed and the whole system is back to being alive. This is exactly what is expected when recursive I/O happens on dm-thin volume when it is used as swap. Depends-on: a666e5c05e7c ("dm: fix deadlock when swapping to encrypted device") Cc: stable@vger.kernel.org Signed-off-by: Coly Li <colyli@suse.de> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# e4f80303	16-Feb-2023	Mike Snitzer <snitzer@kernel.org>	dm thin: add cond_resched() to various workqueue loops Otherwise on resource constrained systems these workqueues may be too greedy. Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 774f13ac	07-Feb-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: declare variables static when sensible Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 6a808034	06-Feb-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: avoid using symbolic permissions Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 0ef0b471	01-Feb-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: add missing empty lines Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# a4a82ce3	26-Jan-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: correct block comments format. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 255e2646	25-Jan-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: address indent/space issues Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 86a3238c	25-Jan-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: change "unsigned" to "unsigned int" Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 3bd94003	25-Jan-2023	Heinz Mauelshagen <heinzm@redhat.com>	dm: add missing SPDX-License-Indentifiers 'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has deprecated its use. Suggested-by: John Wiele <jwiele@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# c34b7ac6	06-Dec-2022	Christoph Hellwig <hch@lst.de>	block: remove bio_set_op_attrs This macro is obsolete, so replace the last few uses with open coded bi_opf assignments. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de <mailto:colyli@suse.de>> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20221206144057.720846-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 19eb1650	29-Nov-2022	Luo Meng <luomeng12@huawei.com>	dm thin: resume even if in FAIL mode If a thinpool set fail_io while suspending, resume will fail with: device-mapper: resume ioctl on vg-thinpool failed: Invalid argument The thin-pool also can't be removed if an in-flight bio is in the deferred list. This can be easily reproduced using: echo "offline" > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4K count=1 dmsetup suspend /dev/mapper/pool mkfs.ext4 /dev/mapper/thin dmsetup resume /dev/mapper/pool The root cause is maybe_resize_data_dev() will check fail_io and return error before called dm_resume. Fix this by adding FAIL mode check at the end of pool_preresume(). Cc: stable@vger.kernel.org Fixes: da105ed5fd7e ("dm thin metadata: introduce dm_pool_abort_metadata") Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 88430ebc	28-Nov-2022	Luo Meng <luomeng12@huawei.com>	dm thin: Fix UAF in run_timer_softirq() When dm_resume() and dm_destroy() are concurrent, it will lead to UAF, as follows: BUG: KASAN: use-after-free in __run_timers+0x173/0x710 Write of size 8 at addr ffff88816d9490f0 by task swapper/0/0 <snip> Call Trace: <IRQ> dump_stack_lvl+0x73/0x9f print_report.cold+0x132/0xaa2 _raw_spin_lock_irqsave+0xcd/0x160 __run_timers+0x173/0x710 kasan_report+0xad/0x110 __run_timers+0x173/0x710 __asan_store8+0x9c/0x140 __run_timers+0x173/0x710 call_timer_fn+0x310/0x310 pvclock_clocksource_read+0xfa/0x250 kvm_clock_read+0x2c/0x70 kvm_clock_get_cycles+0xd/0x20 ktime_get+0x5c/0x110 lapic_next_event+0x38/0x50 clockevents_program_event+0xf1/0x1e0 run_timer_softirq+0x49/0x90 __do_softirq+0x16e/0x62c __irq_exit_rcu+0x1fa/0x270 irq_exit_rcu+0x12/0x20 sysvec_apic_timer_interrupt+0x8e/0xc0 One of the concurrency UAF can be shown as below: use free do_resume \| __find_device_hash_cell \| dm_get \| atomic_inc(&md->holders) \| \| dm_destroy \| __dm_destroy \| if (!dm_suspended_md(md)) \| atomic_read(&md->holders) \| msleep(1) dm_resume \| __dm_resume \| dm_table_resume_targets \| pool_resume \| do_waker #add delay work \| dm_put \| atomic_dec(&md->holders) \| \| dm_table_destroy \| pool_dtr \| __pool_dec \| __pool_destroy \| destroy_workqueue \| kfree(pool) # free pool time out __do_softirq run_timer_softirq # pool has already been freed This can be easily reproduced using: 1. create thin-pool 2. dmsetup suspend pool 3. dmsetup resume pool 4. dmsetup remove_all # Concurrent with 3 The root cause of this UAF bug is that dm_resume() adds timer after dm_destroy() skips cancelling the timer because of suspend status. After timeout, it will call run_timer_softirq(), however pool has already been freed. The concurrency UAF bug will happen. Therefore, cancelling timer again in __pool_destroy(). Cc: stable@vger.kernel.org Fixes: 991d9fa02da0d ("dm: add thin provisioning target") Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 3534e5a5	14-Jul-2022	Luo Meng <luomeng12@huawei.com>	dm thin: fix use-after-free crash in dm_sm_register_threshold_callback Fault inject on pool metadata device reports: BUG: KASAN: use-after-free in dm_pool_register_metadata_threshold+0x40/0x80 Read of size 8 at addr ffff8881b9d50068 by task dmsetup/950 CPU: 7 PID: 950 Comm: dmsetup Tainted: G W 5.19.0-rc6 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_address_description.constprop.0.cold+0xeb/0x3f4 kasan_report.cold+0xe6/0x147 dm_pool_register_metadata_threshold+0x40/0x80 pool_ctr+0xa0a/0x1150 dm_table_add_target+0x2c8/0x640 table_load+0x1fd/0x430 ctl_ioctl+0x2c4/0x5a0 dm_ctl_ioctl+0xa/0x10 __x64_sys_ioctl+0xb3/0xd0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This can be easily reproduced using: echo offline > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4k count=10 dmsetup load pool --table "0 20971520 thin-pool /dev/sda /dev/sdb 128 0 0" If a metadata commit fails, the transaction will be aborted and the metadata space maps will be destroyed. If a DM table reload then happens for this failed thin-pool, a use-after-free will occur in dm_sm_register_threshold_callback (called from dm_pool_register_metadata_threshold). Fix this by in dm_pool_register_metadata_threshold() by returning the -EINVAL error if the thin-pool is in fail mode. Also fail pool_ctr() with a new error message: "Error registering metadata threshold". Fixes: ac8c3f3df65e4 ("dm thin: generate event when metadata threshold passed") Cc: stable@vger.kernel.org Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Luo Meng <luomeng12@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
# 44abff2c	14-Apr-2022	Christoph Hellwig <hch@lst.de>	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 70200574	14-Apr-2022	Christoph Hellwig <hch@lst.de>	block: remove QUEUE_FLAG_DISCARD Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# b7f8dff0	10-Mar-2022	Mike Snitzer <snitzer@redhat.com>	dm: simplify dm_sumbit_bio_remap interface Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the need for dm_sumbit_bio_remap() callers to know whether they are calling for a workqueue or from the original dm_submit_bio(). Add map_task to dm_io struct, record the map_task in alloc_io and clear it after all target ->map() calls have completed. Update dm_sumbit_bio_remap to check if 'current' matches io->map_task rather than rely on passed 'from_rq' argument. This change really simplifies the chore of porting each DM target to using dm_sumbit_bio_remap() because there is no longer the risk of programming error by not completely knowing all the different contexts a particular method that calls dm_sumbit_bio_remap() might be used in. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# a9251281	08-Mar-2022	Mike Snitzer <snitzer@redhat.com>	dm thin: use dm_submit_bio_remap Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# 385411ff	01-Mar-2022	Christoph Hellwig <hch@lst.de>	dm: stop using bdevname Just use the %pg format specifier instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# 8ca8b1e1	14-Feb-2022	Wang Qing <wangqing@vivo.com>	dm thin: use time_is_before_jiffies instead of open coding it Use time_is_before_jiffies() to improve code readability. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# 07888c66	24-Jan-2022	Christoph Hellwig <hch@lst.de>	block: pass a block_device and opf to bio_alloc Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 28d7d128	24-Jan-2022	Christoph Hellwig <hch@lst.de>	dm-thin: use blkdev_issue_flush instead of open coding it Use blkdev_issue_flush, which uses an on-stack bio instead of an opencoded version with a bio embedded into struct pool. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 53db984e	24-Jan-2022	Christoph Hellwig <hch@lst.de>	dm: bio_alloc can't fail if it is allowed to sleep Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 6dcbb52c	17-Oct-2021	Christoph Hellwig <hch@lst.de>	dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them Use the proper helpers to read the block device size. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 8ec45662	12-Jul-2021	Tushar Sugandhi <tusharsu@linux.microsoft.com>	dm: update target status functions to support IMA measurement For device mapper targets to take advantage of IMA's measurement capabilities, the status functions for the individual targets need to be updated to handle the status_type_t case for value STATUSTYPE_IMA. Update status functions for the following target types, to log their respective attributes to be measured using IMA. 01. cache 02. crypt 03. integrity 04. linear 05. mirror 06. multipath 07. raid 08. snapshot 09. striped 10. verity For rest of the targets, handle the STATUSTYPE_IMA case by setting the measurement buffer to NULL. For IMA to measure the data on a given system, the IMA policy on the system needs to be updated to have the following line, and the system needs to be restarted for the measurements to take effect. /etc/ima/ima-policy measure func=CRITICAL_DATA label=device-mapper template=ima-buf The measurements will be reflected in the IMA logs, which are located at: /sys/kernel/security/integrity/ima/ascii_runtime_measurements /sys/kernel/security/integrity/ima/binary_runtime_measurements These IMA logs can later be consumed by various attestation clients running on the system, and send them to external services for attesting the system. The DM target data measured by IMA subsystem can alternatively be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with DM_TABLE_STATUS_CMD. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# 695902bb	19-Mar-2021	Xu Wang <vulab@iscas.ac.cn>	dm thin: remove needless request_queue NULL pointer check Since commit ff9ea323816d ("block, bdi: an active gendisk always has a request_queue associated with it") the request_queue pointer returned from bdev_get_queue() shall never be NULL. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# 21cf8661	01-Jul-2020	Christoph Hellwig <hch@lst.de>	writeback: remove bdi->congested_fn Except for pktdvd, the only places setting congested bits are file systems that allocate their own backing_dev_info structures. And pktdvd is a deprecated driver that isn't useful in stack setup either. So remove the dead congested_fn stacking infrastructure. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [axboe: fixup unused variables in bcache/request.c] Signed-off-by: Jens Axboe <axboe@kernel.dk>
# ed00aabd	01-Jul-2020	Christoph Hellwig <hch@lst.de>	block: rename generic_make_request to submit_bio_noacct generic_make_request has always been very confusingly misnamed, so rename it to submit_bio_noacct to make it clear that it is submit_bio minus accounting and a few checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# f06c03d1	13-Jan-2020	Mikulas Patocka <mpatocka@redhat.com>	dm thin: change data device's flush_bio to be member of struct pool With commit fe64369163c5 ("dm thin: don't allow changing data device during thin-pool load") it is now possible to re-parent the data device's flush_bio from the pool_c to pool structure. Doing so offers improved lifetime guarantees for the flush_bio so that the call to dm_pool_register_pre_commit_callback can now be done safely from pool_ctr(). Depends-on: fe64369163c5 ("dm thin: don't allow changing data device during thin-pool load") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# 873937e7	13-Jan-2020	Mikulas Patocka <mpatocka@redhat.com>	dm thin: don't allow changing data device during thin-pool reload The existing code allows changing the data device when the thin-pool target is reloaded. This capability is not required and only complicates device lifetime guarantees. This can cause crashes like the one reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1788596 where the kernel tries to issue a flush bio located in a structure that was already freed. Take the first step to simplifying the thin-pool's data device lifetime by disallowing changing it. Like the thin-pool's metadata device, the data device is now set in pool_create() and it cannot be changed for a given thin-pool. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
# a4a8d286	12-Jan-2020	Mike Snitzer <snitzer@redhat.com>	dm thin: fix use-after-free in metadata_pre_commit_callback dm-thin uses struct pool to hold the state of the pool. There may be multiple pool_c's pointing to a given pool, each pool_c represents a loaded target. pool_c's may be created and destroyed arbitrarily and the pool contains a reference count of pool_c's pointing to it. Since commit 694cfe7f31db3 ("dm thin: Flush data device before committing metadata") a pointer to pool_c is passed to dm_pool_register_pre_commit_callback and this function stores it in pmd->pre_commit_context. If this pool_c is freed, but pool is not (because there is another pool_c referencing it), we end up in a situation where pmd->pre_commit_context structure points to freed pool_c. It causes a crash in metadata_pre_commit_callback. Fix this by moving the dm_pool_register_pre_commit_callback() from pool_ctr() to pool_preresume(). This way the in-core thin-pool metadata is only ever armed with callback data whose lifetime matches the active thin-pool target. In should be noted that this fix preserves the ability to load a thin-pool table that uses a different data block device (that contains the same data) -- though it is unclear if that capability is still useful and/or needed. Fixes: 694cfe7f31db3 ("dm thin: Flush data device before committing metadata") Cc: stable@vger.kernel.org Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>