History log of /linux-master/drivers/md/dm-cache-metadata.c
Revision Date Author Comments
# ac4149ba 25-Sep-2023 Justin Stitt <justinstitt@google.com>

dm cache metadata: replace deprecated strncpy with strscpy

`strncpy` is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.

It seems `cmd->policy_name` is intended to be NUL-terminated based on a
now changed line of code from Commit (c6b4fcbad044e6ff "dm: add cache
target"):
| if (strcmp(cmd->policy_name, policy_name)) { // ...

However, now a length-bounded strncmp is used:
| if (strncmp(cmd->policy_name, policy_name, sizeof(cmd->policy_name)))
... which means NUL-terminated may not strictly be required. However, I
believe the intent of the code is clear and we should maintain
NUL-termination of policy_names.

Moreover, __begin_transaction_flags() zero-allocates `cmd` before
calling read_superblock_fields():
| cmd = kzalloc(sizeof(*cmd), GFP_KERNEL);

Also, `disk_super->policy_name` is zero-initialized
| memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
... therefore any NUL-padding is redundant.

Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 8a0e8bb1 11-Sep-2023 Qi Zheng <zhengqi.arch@bytedance.com>

mm: shrinker: convert shrinker_rwsem to mutex

Now there are no readers of shrinker_rwsem, so we can simply replace it
with mutex lock.

[akpm@linux-foundation.org: update the fix to alloc_shrinker_info()]
Link: https://lkml.kernel.org/r/20230911094444.68966-46-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Chuck Lever <cel@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sean Paul <sean@poorly.run>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Song Liu <song@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 47a7c01c 09-Jun-2023 Qi Zheng <zhengqi.arch@bytedance.com>

Revert "mm: shrinkers: convert shrinker_rwsem to mutex"

Patch series "revert shrinker_srcu related changes".


This patch (of 7):

This reverts commit cf2e309ebca7bb0916771839f9b580b06c778530.

Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec
test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make
global slab shrink lockless"). The root cause is that SRCU has to be
careful to not frequently check for SRCU read-side critical section exits.
Therefore, even if no one is currently in the SRCU read-side critical
section, synchronize_srcu() cannot return quickly. That's why
unregister_shrinker() has become slower.

After discussion, we will try to use the refcount+RCU method [2] proposed
by Dave Chinner to continue to re-implement the lockless slab shrink. So
revert the shrinker_mutex back to shrinker_rwsem first.

[1]. https://lore.kernel.org/lkml/202305230837.db2c233f-yujie.liu@intel.com/
[2]. https://lore.kernel.org/lkml/ZIJhou1d55d4H1s0@dread.disaster.area/

Link: https://lkml.kernel.org/r/20230609081518.3039120-1-qi.zheng@linux.dev
Link: https://lkml.kernel.org/r/20230609081518.3039120-2-qi.zheng@linux.dev
Reported-by: kernel test robot <yujie.liu@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202305230837.db2c233f-yujie.liu@intel.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# cf2e309e 13-Mar-2023 Qi Zheng <zhengqi.arch@bytedance.com>

mm: shrinkers: convert shrinker_rwsem to mutex

Now there are no readers of shrinker_rwsem, so we can simply replace it
with mutex lock.

Link: https://lkml.kernel.org/r/20230313112819.38938-9-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill Tkhai <tkhai@ya.ru>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 1c131886 06-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: prefer '"%s...", __func__'

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 0ef0b471 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing empty lines

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# a4a82ce3 26-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: correct block comments format.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 96422281 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: address space issues relative to switch/while/for/...

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 86a3238c 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: change "unsigned" to "unsigned int"

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 3bd94003 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing SPDX-License-Indentifiers

'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.

Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 352b837a 30-Nov-2022 Mike Snitzer <snitzer@kernel.org>

dm cache: Fix ABBA deadlock between shrink_slab and dm_cache_metadata_abort

Same ABBA deadlock pattern fixed in commit 4b60f452ec51 ("dm thin: Fix
ABBA deadlock between shrink_slab and dm_pool_abort_metadata") to
DM-cache's metadata.

Reported-by: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# d254c369 04-May-2022 Guo Zhengkui <guozhengkui@vivo.com>

dm cache metadata: remove unnecessary variable in __dump_mapping

Fix the following coccicheck warning:

drivers/md/dm-cache-metadata.c:1512:5-6: Unneeded variable: "r".
Return "0" on line 1520.

Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 6dcbb52c 17-Oct-2021 Christoph Hellwig <hch@lst.de>

dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them

Use the proper helpers to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1e0dcca9 09-Jan-2021 Christoph Hellwig <hch@lst.de>

dm: use bdev_read_only to check if a device is read-only

dm-thin and dm-cache also work on partitions, so use the proper
interface to check if the device is read-only.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d16ff19e 01-Sep-2020 Ye Bin <yebin10@huawei.com>

dm cache metadata: Avoid returning cmd->bm wild pointer on error

Maybe __create_persistent_data_objects() caller will use PTR_ERR as a
pointer, it will lead to some strange things.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# e28adc3b 17-Apr-2019 Nikos Tsironis <ntsironis@arrikto.com>

dm cache metadata: Fix loading discard bitset

Add missing dm_bitset_cursor_next() to properly advance the bitset
cursor.

Otherwise, the discarded state of all blocks is set according to the
discarded state of the first block.

Fixes: ae4a46a1f6 ("dm cache metadata: use bitset cursor api to load discard bitset")
Cc: stable@vger.kernel.org
Signed-off-by: Nikos Tsironis <ntsironis@arrikto.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 687cf441 09-Nov-2018 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty()

Otherwise dm_bitset_cursor_begin() return -ENODATA. Other calls to
dm_bitset_cursor_begin() have similar negative checks.

Fixes inability to create a cache in passthrough mode (even though doing
so makes no sense).

Fixes: 0d963b6e65 ("dm cache metadata: fix metadata2 format's blocks_are_clean_separate_dirty")
Cc: stable@vger.kernel.org
Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4561ffca 24-Sep-2018 Joe Thornber <ejt@redhat.com>

dm cache metadata: ignore hints array being too small during resize

Commit fd2fa9541 ("dm cache metadata: save in-core policy_hint_size to
on-disk superblock") enabled previously written policy hints to be
used after a cache is reactivated. But in doing so the cache
metadata's hint array was left exposed to out of bounds access because
on resize the metadata's on-disk hint array wasn't ever extended.

Fix this by ignoring that there are no on-disk hints associated with the
newly added cache blocks. An expanded on-disk hint array is later
rewritten upon the next clean shutdown of the cache.

Fixes: fd2fa9541 ("dm cache metadata: save in-core policy_hint_size to on-disk superblock")
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 5b1fe7be 08-Aug-2018 Ilya Dryomov <idryomov@gmail.com>

dm cache metadata: set dirty on all cache blocks after a crash

Quoting Documentation/device-mapper/cache.txt:

The 'dirty' state for a cache block changes far too frequently for us
to keep updating it on the fly. So we treat it as a hint. In normal
operation it will be written when the dm device is suspended. If the
system crashes all cache blocks will be assumed dirty when restarted.

This got broken in commit f177940a8091 ("dm cache metadata: switch to
using the new cursor api for loading metadata") in 4.9, which removed
the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk
flag) when loading cache blocks. This results in data corruption on an
unclean shutdown with dirty cache blocks on the fast device. After the
crash those blocks are considered clean and may get evicted from the
cache at any time. This can be demonstrated by doing a lot of reads
to trigger individual evictions, but uncache is more predictable:

### Disable auto-activation in lvm.conf to be able to do uncache in
### time (i.e. see uncache doing flushing) when the fix is applied.

# xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb
# vgcreate vg_cache /dev/vdb /dev/vdc
# lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb
# lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc
# lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc
# lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev
# lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev
# xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# dmsetup status vg_cache-lv_slowdev
0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw -
^^^^
7065 * 64k = 441M yet to be written to the slow device
# echo b >/proc/sysrq-trigger

# vgchange -ay vg_cache
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# lvconvert --uncache vg_cache/lv_slowdev
Flushing 0 blocks for cache vg_cache/lv_slowdev.
Logical volume "lv_cachedev" successfully removed
Logical volume vg_cache/lv_slowdev is not cached.
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................
0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................

This is the case with both v1 and v2 cache pool metatata formats.

After applying this patch:

# vgchange -ay vg_cache
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
# lvconvert --uncache vg_cache/lv_slowdev
Flushing 3724 blocks for cache vg_cache/lv_slowdev.
...
Flushing 71 blocks for cache vg_cache/lv_slowdev.
Logical volume "lv_cachedev" successfully removed
Logical volume vg_cache/lv_slowdev is not cached.
# xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2
0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................

Cc: stable@vger.kernel.org
Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# fd2fa954 02-Aug-2018 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: save in-core policy_hint_size to on-disk superblock

policy_hint_size starts as 0 during __write_initial_superblock(). It
isn't until the policy is loaded that policy_hint_size is set in-core
(cmd->policy_hint_size). But it never got recorded in the on-disk
superblock because __commit_transaction() didn't deal with transfering
the in-core cmd->policy_hint_size to the on-disk superblock.

The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
__begin_transaction_flags() which re-reads all superblock fields.
Because the superblock's policy_hint_size was never properly stored, when
the cache was created, hints_array_available() would always return false
when re-activating a previously created cache. This means
__load_mappings() always considered the hints invalid and never made use
of the hints (these hints served to optimize).

Another detremental side-effect of this oversight is the cache_check
utility would fail with: "invalid hint width: 0"

Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 6bdd0796 20-Oct-2017 Elena Reshetova <elena.reshetova@intel.com>

dm cache: convert dm_cache_metadata.ref_count from atomic_t to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable dm_cache_metadata.ref_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 10add84e 05-May-2017 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: fail operations if fail_io mode has been established

Otherwise it is possible to trigger crashes due to the metadata being
inaccessible yet these methods don't safely account for that possibility
without these checks.

Cc: stable@vger.kernel.org
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 73cbca6a 27-Apr-2017 Bart Van Assche <bvanassche@acm.org>

dm block manager: remove an unused argument from dm_block_manager_create()

The 'cache_size' argument of dm_block_manager_create() has never been
used. Remove it along with the definitions of the constants passed as
the 'cache_size' argument.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 0d963b6e 20-Mar-2017 Joe Thornber <ejt@redhat.com>

dm cache metadata: fix metadata2 format's blocks_are_clean_separate_dirty

The dm_bitset_cursor_begin() call was using the incorrect nr_entries.
Also, the last dm_bitset_cursor_next() must be avoided if we're at the
end of the cursor.

Fixes: 7f1b21591a6 ("dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 7f1b2159 04-Oct-2016 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: use cursor api in blocks_are_clean_separate_dirty()

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 683bb1a3 22-Sep-2016 Joe Thornber <ejt@redhat.com>

dm cache metadata: use dm_bitset_new() to create the dirty bitset in format 2

Big speed up with large configs.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 48551054 04-Oct-2016 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: name the cache block that couldn't be loaded

Improves __load_mapping_v1() and __load_mapping_v2() DMERR messages to
explicitly name the cache block number whose mapping couldn't be
loaded.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 629d0a8a 22-Sep-2016 Joe Thornber <ejt@redhat.com>

dm cache metadata: add "metadata2" feature

If "metadata2" is provided as a table argument when creating/loading a
cache target a more compact metadata format, with separate dirty bits,
is used. "metadata2" improves speed of shutting down a cache target.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ae4a46a1 03-Oct-2016 Joe Thornber <ejt@redhat.com>

dm cache metadata: use bitset cursor api to load discard bitset

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 07d93882 03-Oct-2016 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: remove an extra newline in DMERR and code

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# f177940a 20-Sep-2016 Joe Thornber <ejt@redhat.com>

dm cache metadata: switch to using the new cursor api for loading metadata

This change offers a pretty significant performance improvement.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4e781b49 15-Sep-2016 Joe Thornber <ejt@redhat.com>

dm cache: speed up writing of the hint array

It's far quicker to always delete the hint array and recreate with
dm_array_new() because we avoid the copying caused by mutation.

Also simplifies the policy interface, replacing the walk_hints() with
the simpler get_hint().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 6545b60b 16-Apr-2016 Ahmed Samy <f.fallen45@gmail.com>

dm cache metadata: fix cmd_read_lock() acquiring write lock

Commit 9567366fefdd ("dm cache metadata: fix READ_LOCK macros and
cleanup WRITE_LOCK macros") uses down_write() instead of down_read() in
cmd_read_lock(), yet up_read() is used to release the lock in
READ_UNLOCK(). Fix it.

Fixes: 9567366fefdd ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros")
Cc: stable@vger.kernel.org
Signed-off-by: Ahmed Samy <f.fallen45@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 9567366f 11-Apr-2016 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros

The READ_LOCK macro was incorrectly returning -EINVAL if
dm_bm_is_read_only() was true -- it will always be true once the cache
metadata transitions to read-only by dm_cache_metadata_set_read_only().

Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0).
Also, all accesses of the 'cmd' argument passed to these related macros
are now encapsulated in parenthesis.

A follow-up patch can be developed to eliminate the use of macros in
favor of pure C code. Avoiding that now given that this needs to apply
to stable@.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: d14fcf3dd79 ("dm cache: make sure every metadata function checks fail_io")
Cc: stable@vger.kernel.org


# d14fcf3d 10-Mar-2016 Joe Thornber <ejt@redhat.com>

dm cache: make sure every metadata function checks fail_io

Otherwise operations may be attempted that will only ever go on to crash
(since the metadata device is either missing or unreliable if 'fail_io'
is set).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 4c7da06f 22-Oct-2015 Mikulas Patocka <mpatocka@redhat.com>

dm persistent data: eliminate unnecessary return values

dm_bm_unlock and dm_tm_unlock return an integer value but the returned
value is always 0. The calling code sometimes checks the return value
and sometimes doesn't.

Eliminate these unnecessary return values and also the checks for them.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 3201ac45 22-Oct-2015 Joe Thornber <ejt@redhat.com>

dm cache: the CLEAN_SHUTDOWN flag was not being set

If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache
blocks are marked as dirty and a full writeback occurs.

__commit_transaction() is responsible for setting/clearing
CLEAN_SHUTDOWN (based the flags_mutator that is passed in).

Fix this issue, of the cache's on-disk flags being wrong, by making sure
__commit_transaction() does not reset the flags after the mutator has
altered the flags in preparation for them being serialized to disk.

before:

sb_flags = mutator(le32_to_cpu(disk_super->flags));
disk_super->flags = cpu_to_le32(sb_flags);
disk_super->flags = cpu_to_le32(cmd->flags);

after:

disk_super->flags = cpu_to_le32(cmd->flags);
sb_flags = mutator(le32_to_cpu(disk_super->flags));
disk_super->flags = cpu_to_le32(sb_flags);

Reported-by: Bogdan Vasiliev <bogdan.vasiliev@gmail.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 028ae9f7 22-Apr-2015 Joe Thornber <ejt@redhat.com>

dm cache: add fail io mode and needs_check flag

If a cache metadata operation fails (e.g. transaction commit) the
cache's metadata device will abort the current transaction, set a new
needs_check flag, and the cache will transition to "read-only" mode. If
aborting the transaction or setting the needs_check flag fails the cache
will transition to "fail-io" mode.

Once needs_check is set the cache device will not be allowed to
activate. Activation requires write access to metadata. Future work is
needed to add proper support for running the cache in read-only mode.

Once in fail-io mode the cache will report a status of "Fail".

Also, add commit() wrapper that will disallow commits if in read_only or
fail mode.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 766a7888 27-Jan-2015 Joe Thornber <ejt@redhat.com>

dm cache: fix missing ERR_PTR returns and handling

Commit 9b1cc9f251 ("dm cache: share cache-metadata object across
inactive and active DM tables") mistakenly ignored the use of ERR_PTR
returns. Restore missing IS_ERR checks and ERR_PTR returns where
appropriate.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 9b1cc9f2 23-Jan-2015 Joe Thornber <ejt@redhat.com>

dm cache: share cache-metadata object across inactive and active DM tables

If a DM table is reloaded with an inactive table when the device is not
suspended (normal procedure for LVM2), then there will be two dm-bufio
objects that can diverge. This can lead to a situation where the
inactive table uses bufio to read metadata at the same time the active
table writes metadata -- resulting in the inactive table having stale
metadata buffers once it is promoted to the active table slot.

Fix this by using reference counting and a global list of cache metadata
objects to ensure there is only one metadata object per metadata device.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 1bad9bc4 07-Nov-2014 Joe Thornber <ejt@redhat.com>

dm cache: revert "remove remainder of distinct discard block size"

This reverts commit 64ab346a360a4b15c28fb8531918d4a01f4eabd9 because we
actually do want to allow the discard blocksize to be larger than the
cache blocksize. Further dm-cache discard changes will make this
possible.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 895b47d7 14-Jul-2014 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: use dm-space-map-metadata.h defined size limits

Commit 7d48935e cleaned up the persistent-data's space-map-metadata
limits by elevating them to dm-space-map-metadata.h. Update
dm-cache-metadata to use these same limits.

The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the
sizeof the disk_bitmap_header. So the supported maximum metadata size
is a bit smaller (reduced from 33423360 to 33292800 sectors).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>


# 048e5a07 14-Jul-2014 Mike Snitzer <snitzer@redhat.com>

dm cache metadata: do not allow the data block size to change

The block size for the dm-cache's data device must remained fixed for
the life of the cache. Disallow any attempt to change the cache's data
block size.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org


# 0596661f 03-Apr-2014 Joe Thornber <ejt@redhat.com>

dm cache: fix a lock-inversion

When suspending a cache the policy is walked and the individual policy
hints written to the metadata via sync_metadata(). This led to this
lock order:

policy->lock
cache_metadata->root_lock

When loading the cache target the policy is populated while the metadata
lock is held:

cache_metadata->root_lock
policy->lock

Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
ensuring the cache_metadata root_lock is held whilst all the hints are
written, rather than being repeatedly locked while policy->lock is held
(as was the case with each callout that policy_walk_mappings() made to
the old save_hint() method).

Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
locking correctness") build option. However, it is not clear how the
LOCKDEP reported paths can lead to a deadlock since the two paths,
suspending a target and loading a target, never occur at the same time.
But that doesn't mean the same lock-inversion couldn't have occurred
elsewhere.

Reported-by: Marian Csontos <mcsontos@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 5a32083d 27-Mar-2014 Joe Thornber <ejt@redhat.com>

dm: take care to copy the space map roots before locking the superblock

In theory copying the space map root can fail, but in practice it never
does because we're careful to check what size buffer is needed.

But make certain we're able to copy the space map roots before
locking the superblock.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed


# a9d45396 27-Mar-2014 Joe Thornber <ejt@redhat.com>

dm transaction manager: fix corruption due to non-atomic transaction commit

The persistent-data library used by dm-thin, dm-cache, etc is
transactional. If anything goes wrong, such as an io error when writing
new metadata or a power failure, then we roll back to the last
transaction.

Atomicity when committing a transaction is achieved by:

a) Never overwriting data from the previous transaction.
b) Writing the superblock last, after all other metadata has hit the
disk.

This commit and the following commit ("dm: take care to copy the space
map roots before locking the superblock") fix a bug associated with (b).
When committing it was possible for the superblock to still be written
in spite of an io error occurring during the preceeding metadata flush.
With these commits we're careful not to take the write lock out on the
superblock until after the metadata flush has completed.

Change the transaction manager's semantics for dm_tm_commit() to assume
all data has been flushed _before_ the single superblock that is passed
in.

As a prerequisite, split the block manager's block unlocking and
flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now
the unlocking must be done separately.

This issue was discovered by forcing io errors at the crucial time
using dm-flakey.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 64ab346a 27-Mar-2014 Heinz Mauelshagen <heinzm@redhat.com>

dm cache: remove remainder of distinct discard block size

Discard block size not being equal to cache block size causes data
corruption by erroneously avoiding migrations in issue_copy() because
the discard state is being cleared for a group of cache blocks when it
should not.

Completely remove all code that enabled a distinction between the
cache block size and discard block size.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 53d49819 16-Oct-2013 Joe Thornber <ejt@redhat.com>

dm cache metadata: check the metadata version when reading the superblock

Need to check the version to verify on-disk metadata is supported.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 2ee57d58 24-Oct-2013 Joe Thornber <ejt@redhat.com>

dm cache: add passthrough mode

"Passthrough" is a dm-cache operating mode (like writethrough or
writeback) which is intended to be used when the cache contents are not
known to be coherent with the origin device. It behaves as follows:

* All reads are served from the origin device (all reads miss the cache)
* All writes are forwarded to the origin device; additionally, write
hits cause cache block invalidates

This mode decouples cache coherency checks from cache device creation,
largely to avoid having to perform coherency checks while booting. Boot
scripts can create cache devices in passthrough mode and put them into
service (mount cached filesystems, for example) without having to worry
about coherency. Coherency that exists is maintained, although the
cache will gradually cool as writes take place.

Later, applications can perform coherency checks, the nature of which
will depend on the type of the underlying storage. If coherency can be
verified, the cache device can be transitioned to writethrough or
writeback mode while still warm; otherwise, the cache contents can be
discarded prior to transitioning to the desired operating mode.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# f494a9c6 31-Oct-2013 Joe Thornber <ejt@redhat.com>

dm cache: cache shrinking support

Allow a cache to shrink if the blocks being removed from the cache are
not dirty.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# dd8b0c20 24-Oct-2013 Joe Thornber <ejt@redhat.com>

dm cache metadata: return bool from __superblock_all_zeroes

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 8c5008fa 10-May-2013 Joe Thornber <ejt@redhat.com>

dm cache: replace memcpy with struct assignment

Use struct assignment rather than memcpy in dm cache.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# ea2dd8c1 20-Mar-2013 Mike Snitzer <snitzer@redhat.com>

dm cache: policy ignore hints if generated by different version

When reading the dm cache metadata from disk, ignore the policy hints
unless they were generated by the same major version number of the same
policy module.

The hints are considered to be private data belonging to the specific
module that generated them and there is no requirement for them to make
sense to different versions of the policy that generated them.
Policy modules are all required to work fine if no previous hints are
supplied (or if existing hints are lost).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 4e7f506f 20-Mar-2013 Mike Snitzer <snitzer@redhat.com>

dm cache: policy change version from string to integer set

Separate dm cache policy version string into 3 unsigned numbers
corresponding to major, minor and patchlevel and store them at the end
of the on-disk metadata so we know which version of the policy generated
the hints in case a future version wants to use them differently.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 79ed9caf 20-Mar-2013 Joe Thornber <ejt@redhat.com>

dm cache: metadata clear dirty bits on clean shutdown

When writing the dirty bitset to the metadata device on a clean
shutdown, clear the dirty bits. Previously they were left indicating
the cache was dirty. This led to confusion about whether there really
was dirty data in the cache or not. (This was a harmless bug.)

Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# c6b4fcba 01-Mar-2013 Joe Thornber <ejt@redhat.com>

dm: add cache target

Add a target that allows a fast device such as an SSD to be used as a
cache for a slower device such as a disk.

A plug-in architecture was chosen so that the decisions about which data
to migrate and when are delegated to interchangeable tunable policy
modules. The first general purpose module we have developed, called
"mq" (multiqueue), follows in the next patch. Other modules are
under development.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>