History log of /linux-master/drivers/md/persistent-data/dm-space-map-metadata.c
Revision Date Author Comments
# 0ef0b471 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing empty lines

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 86a3238c 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: change "unsigned" to "unsigned int"

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 3bd94003 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing SPDX-License-Indentifiers

'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.

Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# be500ed7 13-Apr-2021 Joe Thornber <ejt@redhat.com>

dm space maps: improve performance with inc/dec on ranges of blocks

When we break sharing on btree nodes we typically need to increment
the reference counts to every value held in the node. This can
cause a lot of repeated calls to the space maps. Fix this by changing
the interface to the space map inc/dec methods to take ranges of
adjacent blocks to be operated on.

For installations that are using a lot of snapshots this will reduce
cpu overhead of fundamental operations such as provisioning a new block,
or deleting a snapshot, by as much as 10 times.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 5faafc77 13-Apr-2021 Joe Thornber <ejt@redhat.com>

dm space maps: don't reset space map allocation cursor when committing

Current commit code resets the place where the search for free blocks
will begin back to the start of the metadata device. There are a couple
of repercussions to this:

- The first allocation after the commit is likely to take longer than
normal as it searches for a free block in an area that is likely to
have very few free blocks (if any).

- Any free blocks it finds will have been recently freed. Reusing them
means we have fewer old copies of the metadata to aid recovery from
hardware error.

Fix these issues by leaving the cursor alone, only resetting when the
search hits the end of the metadata device.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4feaef83 07-Jan-2020 Joe Thornber <ejt@redhat.com>

dm space map common: fix to ensure new block isn't already in use

The space-maps track the reference counts for disk blocks allocated by
both the thin-provisioning and cache targets. There are variants for
tracking metadata blocks and data blocks.

Transactionality is implemented by never touching blocks from the
previous transaction, so we can rollback in the event of a crash.

When allocating a new block we need to ensure the block is free (has
reference count of 0) in both the current and previous transaction.
Prior to this fix we were doing this by searching for a free block in
the previous transaction, and relying on a 'begin' counter to track
where the last allocation in the current transaction was. This
'begin' field was not being updated in all code paths (eg, increment
of a data block reference count due to breaking sharing of a neighbour
block in the same btree leaf).

This fix keeps the 'begin' field, but now it's just a hint to speed up
the search. Instead the current transaction is searched for a free
block, and then the old transaction is double checked to ensure it's
free. Much simpler.

This fixes reports of sm_disk_new_block()'s BUG_ON() triggering when
DM thin-provisioning's snapshots are heavily used.

Reported-by: Eric Wheeler <dm-devel@lists.ewheeler.net>
Cc: stable@vger.kernel.org
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ae148243 18-Aug-2019 ZhangXiaoxu <zhangxiaoxu5@huawei.com>

dm space map metadata: fix missing store of apply_bops() return value

In commit 6096d91af0b6 ("dm space map metadata: fix occasional leak
of a metadata block on resize"), we refactor the commit logic to a new
function 'apply_bops'. But when that logic was replaced in out() the
return value was not stored. This may lead out() returning a wrong
value to the caller.

Fixes: 6096d91af0b6 ("dm space map metadata: fix occasional leak of a metadata block on resize")
Cc: stable@vger.kernel.org
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# fbc61291 01-Oct-2017 Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>

dm space map metadata: use ARRAY_SIZE

Using the ARRAY_SIZE macro improves the readability of the code.

Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# b79af13e 15-Feb-2017 Bhumika Goyal <bhumirks@gmail.com>

dm space map metadata: constify dm_space_map structures

Declare dm_space_map structures as const as they are only passed as an
argument to the function memcpy. This argument is of type const void *,
so dm_space_map structures having this property can be declared as
const.

File size before:
text data bss dec hex filename
4889 240 0 5129 1409 dm-space-map-metadata.o

File size after:
text data bss dec hex filename
5139 0 0 5139 1413 dm-space-map-metadata.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 314c25c5 30-Nov-2016 Benjamin Marzinski <bmarzins@redhat.com>

dm space map metadata: fix 'struct sm_metadata' leak on failed create

In dm_sm_metadata_create() we temporarily change the dm_space_map
operations from 'ops' (whose .destroy function deallocates the
sm_metadata) to 'bootstrap_ops' (whose .destroy function doesn't).

If dm_sm_metadata_create() fails in sm_ll_new_metadata() or
sm_ll_extend(), it exits back to dm_tm_create_internal(), which calls
dm_sm_destroy() with the intention of freeing the sm_metadata, but it
doesn't (because the dm_space_map operations is still set to
'bootstrap_ops').

Fix this by setting the dm_space_map operations back to 'ops' if
dm_sm_metadata_create() fails when it is set to 'bootstrap_ops'.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 51216778 14-Dec-2015 Mike Snitzer <snitzer@redhat.com>

dm space map metadata: remove unused variable in brb_pop()

Remove the unused struct block_op pointer that was inadvertantly
introduced, via cut-and-paste of previous brb_op() code, as part of
commit 50dd842ad.

(Cc'ing stable@ because commit 50dd842ad did)

Fixes: 50dd842ad ("dm space map metadata: fix ref counting bug when bootstrapping a new space map")
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 50dd842a 09-Dec-2015 Joe Thornber <ejt@redhat.com>

dm space map metadata: fix ref counting bug when bootstrapping a new space map

When applying block operations (BOPs) do not remove them from the
uncommitted BOP ring-buffer until after they've been applied -- in case
we recurse.

Also, perform BOP_INC operation, in dm_sm_metadata_create() and
sm_metadata_extend(), in terms of the uncommitted BOP ring-buffer rather
than using direct calls to sm_ll_inc().

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 6096d91a 17-Jun-2015 Joe Thornber <ejt@redhat.com>

dm space map metadata: fix occasional leak of a metadata block on resize

The metadata space map has a simplified 'bootstrap' mode that is
operational when extending the space maps. Whilst in this mode it's
possible for some refcount decrement operations to become queued (eg, as
a result of shadowing one of the bitmap indexes). These decrements were
not being applied when switching out of bootstrap mode.

The effect of this bug was the leaking of a 4k metadata block. This is
detected by the latest version of thin_check as a non fatal error.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# 02717d98 01-Dec-2014 Joe Thornber <ejt@redhat.com>

dm space map metadata: fix sm_bootstrap_get_count()

Must set 'result' accordingly rather than return it.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# c1c6156f 29-Nov-2014 Dan Carpenter <dan.carpenter@oracle.com>

dm space map metadata: fix sm_bootstrap_get_nr_blocks()

This function isn't right and it causes a static checker warning:

drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
error: potentially using uninitialized 'sb_data_size'.

It should set "*count" and return zero on success the same as the
sm_metadata_get_nr_blocks() function does earlier.

Fixes: 3241b1d3e0aa ('dm: add persistent data library')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# cebc2de4 07-Mar-2014 Joe Thornber <ejt@redhat.com>

dm space map metadata: fix refcount decrement below 0 which caused corruption

This has been a relatively long-standing issue that wasn't nailed down
until Teng-Feng Yang's meticulous bug report to dm-devel on 3/7/2014,
see: http://www.redhat.com/archives/dm-devel/2014-March/msg00021.html

From that report:
"When decreasing the reference count of a metadata block with its
reference count equals 3, we will call dm_btree_remove() to remove
this enrty from the B+tree which keeps the reference count info in
metadata device.

The B+tree will try to rebalance the entry of the child nodes in each
node it traversed, and the rebalance process contains the following
steps.

(1) Finding the corresponding children in current node (shadow_current(s))
(2) Shadow the children block (issue BOP_INC)
(3) redistribute keys among children, and free children if necessary (issue BOP_DEC)

Since the update of a metadata block's reference count could be
recursive, we will stash these reference count update operations in
smm->uncommitted and then process them in a FILO fashion.

The problem is that step(3) could free the children which is created
in step(2), so the BOP_DEC issued in step(3) will be carried out
before the BOP_INC issued in step(2) since these BOPs will be
processed in FILO fashion. Once the BOP_DEC from step(3) tries to
decrease the reference count of newly shadow block, it will report
failure for its reference equals 0 before decreasing. It looks like we
can solve this issue by processing these BOPs in a FIFO fashion
instead of FILO."

Commit 5b564d80 ("dm space map: disallow decrementing a reference count
below zero") changed the code to report an error for this temporary
refcount decrement below zero. So what was previously a harmless
invalid refcount became a hard failure due to the new error path:

device-mapper: space map common: unable to decrement a reference count below 0
device-mapper: thin: 253:6: dm_thin_insert_block() failed: error = -22
device-mapper: thin: 253:6: switching pool to read-only mode

This bug is in dm persistent-data code that is common to the DM thin and
cache targets. So any users of those targets should apply this fix.

Fix this by applying recursive space map operations in FIFO order rather
than FILO.

Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=68801

Reported-by: Apollon Oikonomopoulos <apoikos@debian.org>
Reported-by: edwillam1007@gmail.com
Reported-by: Teng-Feng Yang <shinrairis@gmail.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.13+


# 7d48935e 12-Feb-2014 Mike Snitzer <snitzer@redhat.com>

dm thin: allow metadata space larger than supported to go unused

It was always intended that a user could provide a thin metadata device
that is larger than the max supported by the on-disk format. The extra
space would just go unused.

Unfortunately that never worked. If the user attempted to use a larger
metadata device on creation they would get an error like the following:

device-mapper: space map common: space map too large
device-mapper: transaction manager: couldn't create metadata space map
device-mapper: thin metadata: tm_create_with_sm failed
device-mapper: table: 252:17: thin-pool: Error creating metadata object
device-mapper: ioctl: error adding target to table

Fix this by allowing the initial metadata space map creation to cap its
size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).

Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
the sizeof the disk_bitmap_header. So the supported maximum metadata
size is a bit smaller (reduced from 33423360 to 33292800 sectors).

Lastly, remove the "excess space will not be used" warning message from
get_metadata_dev_size(); it resulted in printing the warning multiple
times. Factor out warn_if_metadata_device_too_big(), call it from
pool_ctr() and maybe_resize_metadata_dev().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>


# fca02843 21-Jan-2014 Joe Thornber <ejt@redhat.com>

dm space map metadata: fix bug in resizing of thin metadata

This bug was introduced in commit 7e664b3dec431e ("dm space map metadata:
fix extending the space map").

When extending a dm-thin metadata volume we:

- Switch the space map into a simple bootstrap mode, which allocates
all space linearly from the newly added space.
- Add new bitmap entries for the new space
- Increment the reference counts for those newly allocated bitmap
entries
- Commit changes to disk
- Switch back out of bootstrap mode.

But, the disk commit may allocate space itself, if so this fact will be
lost when switching out of bootstrap mode.

The bug exhibited itself as an error when the bitmap_root, with an
erroneous ref count of 0, was subsequently decremented as part of a
later disk commit. This would cause the disk commit to fail, and thinp
to enter read_only mode. The metadata was not damaged (thin_check
passed).

The fix is to put the increments + commit into a loop, running until
the commit has not allocated extra space. In practise this loop only
runs twice.

With this fix the following device mapper testsuite test passes:
dmtest run --suite thin-provisioning -n thin_remove_works_after_resize

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # depends on commit 7e664b3dec431e


# 7e664b3d 07-Jan-2014 Joe Thornber <ejt@redhat.com>

dm space map metadata: fix extending the space map

When extending a metadata space map we should do the first commit whilst
still in bootstrap mode -- a mode where all blocks get allocated in the
new area.

That way the commit overhead is allocated from the newly added space.
Otherwise we risk running out of space.

With this fix, and the previous commit "dm space map common: make sure
new space is used during extend", the following device mapper testsuite
test passes:
dmtest run --suite thin-provisioning -n /resize_metadata_no_io/

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# c46985e2 13-Dec-2013 Mike Snitzer <snitzer@redhat.com>

dm space map metadata: limit errors in sm_metadata_new_block

The "unable to allocate new metadata block" error can be a particularly
verbose error if there is a systemic issue with the metadata device.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>


# f62b6b8f 02-Dec-2013 Mike Snitzer <snitzer@redhat.com>

dm space map metadata: return on failure in sm_metadata_new_block

Commit 2fc48021f4afdd109b9e52b6eef5db89ca80bac7 ("dm persistent
metadata: add space map threshold callback") introduced a regression
to the metadata block allocation path that resulted in errors being
ignored. This regression was uncovered by running the following
device-mapper-test-suite test:
dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/

The ignored error codes in sm_metadata_new_block() could crash the
kernel through use of either the dm-thin or dm-cache targets, e.g.:

device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
device-mapper: space map metadata: unable to allocate new metadata block
general protection fault: 0000 [#1] SMP
...
Workqueue: dm-thin do_worker [dm_thin_pool]
task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000
RIP: 0010:[<ffffffffa0331385>] [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data]
RSP: 0018:ffff88021a055a68 EFLAGS: 00010202
RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78
RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070
RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010
R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4
R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4
FS: 0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0
Stack:
ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001
ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000
ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c
Call Trace:
[<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data]
[<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data]
[<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data]
[<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data]
[<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data]
[<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data]
[<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data]
[<ffffffff81520036>] ? down_write+0x16/0x40
[<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool]
[<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool]
[<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool]
[<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool]
[<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool]
[<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0
[<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
[<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool]
[<ffffffff81067872>] process_one_work+0x182/0x3b0
[<ffffffff81068c90>] worker_thread+0x120/0x3a0
[<ffffffff81068b70>] ? manage_workers+0x160/0x160
[<ffffffff8106eb2e>] kthread+0xce/0xe0
[<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
[<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0
[<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Cc: stable@vger.kernel.org # v3.10+


# 2fc48021 10-May-2013 Joe Thornber <ejt@redhat.com>

dm persistent metadata: add space map threshold callback

Add a threshold callback to dm persistent data space maps.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 7c3d3f2a 10-May-2013 Joe Thornber <ejt@redhat.com>

dm persistent data: add threshold callback to space map

Add a threshold callback function to the persistent data space map
interface for a subsequent patch to use.

dm-thin and dm-cache are interested in knowing when they're getting
low on metadata or data blocks. This patch introduces a new method
for registering a callback against a threshold.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 1921c56d 10-May-2013 Joe Thornber <ejt@redhat.com>

dm persistent data: support space map resizing

Support extending a dm persistent data metadata space map.

The extend itself is implemented by switching back to the boostrap
allocator and pointing to the new space. The extra bitmap indexes are
then allocated from the new space, and finally we switch back to the
proper space map ops and tweak the reference counts.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 88a488f6 10-May-2013 Joe Thornber <ejt@redhat.com>

dm persistent data: fix error message typos

Fix some typos in dm-space-map-metadata.c error messages.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 7960123f 21-Dec-2012 Joe Thornber <ejt@redhat.com>

dm persistent data: improve improve space map block alloc failure message

Improve space map error message when unable to allocate a new
metadata block.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 3241b1d3 31-Oct-2011 Joe Thornber <thornber@redhat.com>

dm: add persistent data library

The persistent-data library offers a re-usable framework for the storage
and management of on-disk metadata in device-mapper targets.

It's used by the thin-provisioning target in the next patch and in an
upcoming hierarchical storage target.

For further information, please read
Documentation/device-mapper/persistent-data.txt

Signed-off-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>