History log of /linux-master/drivers/md/dm-snap-persistent.c
Revision Date Author Comments
# 6e5f0f63 23-Jan-2024 Hongyu Jin <hongyu.jin@unisoc.com>

dm io: Support IO priority

Some IO will dispatch from kworker with different io_context settings
than the submitting task, we may need to specify a priority to avoid
losing priority.

Add IO priority parameter to dm_io() and update all callers.

Co-developed-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 26cb62a2 16-Mar-2023 Yu Zhe <yuzhe@nfschina.com>

dm: remove unnecessary (void*) conversions

Pointer variables of void * type do not require type cast.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 2e84fecf 03-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: avoid split of quoted strings where possible

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 0ef0b471 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing empty lines

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 02f10ba1 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add argument identifier names

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 8ca817c4 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: avoid spaces before function arguments or in favour of tabs

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 03b18887 30-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: fix trailing statements

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# a4a82ce3 26-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: correct block comments format.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 86a3238c 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: change "unsigned" to "unsigned int"

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 3bd94003 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing SPDX-License-Indentifiers

'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.

Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 0fcb100d 22-Jul-2022 Nathan Huckleberry <nhuck@google.com>

dm bufio: Add flags argument to dm_bufio_client_create

Add a flags argument to dm_bufio_client_create and update all the
callers. This is in preparation to add the DM_BUFIO_NO_SLEEP flag.

Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 6b990139 14-Jul-2022 Bart Van Assche <bvanassche@acm.org>

dm-snap: Combine request operation type and flags

Pass the request operation and its flags as a single argument to improve
kernel code uniformity.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-29-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 581075e4 14-Jul-2022 Bart Van Assche <bvanassche@acm.org>

dm/core: Reduce the size of struct dm_io_request

Combine the bi_op and bi_op_flags into the bi_opf member. Use the new
blk_opf_t type to improve static type checking. This patch does not
change any functionality.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-22-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8ec45662 12-Jul-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

dm: update target status functions to support IMA measurement

For device mapper targets to take advantage of IMA's measurement
capabilities, the status functions for the individual targets need to be
updated to handle the status_type_t case for value STATUSTYPE_IMA.

Update status functions for the following target types, to log their
respective attributes to be measured using IMA.
01. cache
02. crypt
03. integrity
04. linear
05. mirror
06. multipath
07. raid
08. snapshot
09. striped
10. verity

For rest of the targets, handle the STATUSTYPE_IMA case by setting the
measurement buffer to NULL.

For IMA to measure the data on a given system, the IMA policy on the
system needs to be updated to have the following line, and the system
needs to be restarted for the measurements to take effect.

/etc/ima/ima-policy
measure func=CRITICAL_DATA label=device-mapper template=ima-buf

The measurements will be reflected in the IMA logs, which are located at:

/sys/kernel/security/integrity/ima/ascii_runtime_measurements
/sys/kernel/security/integrity/ima/binary_runtime_measurements

These IMA logs can later be consumed by various attestation clients
running on the system, and send them to external services for attesting
the system.

The DM target data measured by IMA subsystem can alternatively
be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
DM_TABLE_STATUS_CMD.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 7a35693a 07-Apr-2021 Matthew Wilcox (Oracle) <willy@infradead.org>

dm: replace dm_vcalloc()

Use kvcalloc or kvmalloc_array instead (depending whether zeroing is
useful).

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 7d837c0d 21-Sep-2020 Qinglang Miao <miaoqinglang@huawei.com>

dm snap persistent: simplify area_io()

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# ed00aabd 01-Jul-2020 Christoph Hellwig <hch@lst.de>

block: rename generic_make_request to submit_bio_noacct

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ad6bf88a 15-Jan-2020 Mikulas Patocka <mpatocka@redhat.com>

block: fix an integer overflow in logical block size

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# afa53df8 15-Mar-2018 Mikulas Patocka <mpatocka@redhat.com>

dm bufio: move dm-bufio.h to include/linux/

Move dm-bufio.h to include/linux/ so that external GPL'd DM target
modules can use it.

It is better to allow the use of dm-bufio than force external modules
to implement the equivalent buffered IO mechanism in some new way. The
hope is this will encourage the use of dm-bufio; which will then make it
easier for a GPL'd external DM target module to be included upstream.

A couple dm-bufio EXPORT_SYMBOL exports have also been updated to use
EXPORT_SYMBOL_GPL.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ff0361b3 31-May-2017 Jan Kara <jack@suse.cz>

dm: make flush bios explicitly sync

Commit b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as
synchronous") removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions. generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d65ac7 ("block: treat REQ_FUA and REQ_PREFLUSH as synchronous")
Cc: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 70fd7614 01-Nov-2016 Christoph Hellwig <hch@lst.de>

block,fs: use REQ_* flags directly

Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly. Where applicable this also drops usage of the
bio_set_op_attrs wrapper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e6047149 05-Jun-2016 Mike Christie <mchristi@redhat.com>

dm: use bio op accessors

Separate the op from the rq_flag_bits and have dm
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 385277bf 08-Jan-2016 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: fix hung bios when copy error occurs

When there is an error copying a chunk dm-snapshot can incorrectly hold
associated bios indefinitely, resulting in hung IO.

The function copy_callback sets pe->error if there was error copying the
chunk, and then calls complete_exception. complete_exception calls
pending_complete on error, otherwise it calls commit_exception with
commit_callback (and commit_callback calls complete_exception).

The persistent exception store (dm-snap-persistent.c) assumes that calls
to prepare_exception and commit_exception are paired.
persistent_prepare_exception increases ps->pending_count and
persistent_commit_exception decreases it.

If there is a copy error, persistent_prepare_exception is called but
persistent_commit_exception is not. This results in the variable
ps->pending_count never returning to zero and that causes some pending
exceptions (and their associated bios) to be held forever.

Fix this by unconditionally calling commit_exception regardless of
whether the copy was successful. A new "valid" parameter is added to
commit_exception -- when the copy fails this parameter is set to zero so
that the chunk that failed to copy (and all following chunks) is not
recorded in the snapshot store. Also, remove commit_callback now that
it is merely a wrapper around pending_complete.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org


# a3d939ae 02-Oct-2015 Mikulas Patocka <mpatocka@redhat.com>

dm: convert ffs to __ffs

ffs counts bit starting with 1 (for the least significant bit), __ffs
counts bits starting with 0. This patch changes various occurrences of ffs
to __ffs and removes subtraction of 1 from the result.

Note that __ffs (unlike ffs) is not defined when called with zero
argument, but it is not called with zero argument in any of these cases.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# a2a678ed 13-Oct-2015 Sudip Mukherjee <sudipm.mukherjee@gmail.com>

dm snapshot persistent: fix missing cleanup in persistent_ctr error path

If an unsupported option is given then the early return from
persistent_ctr() leaked memory allocated for the 'pstore' and never
destroyed the 'metadata_wq'.

Fixes: b0d3cc011e53 ("dm snapshot: add new persistent store option to support overflow")
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# b0d3cc01 08-Oct-2015 Mike Snitzer <snitzer@redhat.com>

dm snapshot: add new persistent store option to support overflow

Commit 76c44f6d80 introduced the possibly for "Overflow" to be reported
by the snapshot device's status. Older userspace (e.g. lvm2) does not
handle the "Overflow" status response.

Fix this incompatibility by requiring newer userspace code, that can
cope with "Overflow", request the persistent store with overflow support
by using "PO" (Persistent with Overflow) for the snapshot store type.

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Fixes: 76c44f6d80 ("dm snapshot: don't invalidate on-disk image on snapshot write overflow")
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# fc0a4461 10-Aug-2015 Viresh Kumar <viresh.kumar@linaro.org>

dm: remove unlikely() before IS_ERR()

IS_ERR() already contains an 'unlikely' compiler flag so there is no
need to do that again from IS_ERR() callers.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 0c8f8632 02-Feb-2015 Markus Elfring <elfring@users.sourceforge.net>

dm snapshot: remove unnecessary NULL checks before vfree() calls

The vfree() function performs input parameter validation.
Thus the NULL pointer test around vfree() calls is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 2c945820 03-Mar-2014 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: fix metadata corruption

Commit 55494bf2947dccdf2 ("dm snapshot: use dm-bufio") broke snapshots.
Before that 3.14-rc1 commit, loading a snapshot's list of exceptions
involved reading exception areas one by one into ps->area and inserting
those exceptions into the hash table. Commit 55494bf2947dccdf2 changed
it so that dm-bufio with prefetch is used to load exceptions in batchs.
Exceptions are loaded correctly, but ps->area is left uninitialized.
When a new exception is allocated, it is stored in this uninitialized
ps->area which will be written to the disk. This causes metadata
corruption.

Fix this corruption by copying the last area that was read via dm-bufio
into ps->area.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 55b082e6 13-Jan-2014 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: use dm-bufio prefetch

This patch modifies dm-snapshot so that it prefetches the buffers when
loading the exceptions.

The number of buffers read ahead is specified in the DM_PREFETCH_CHUNKS
macro. The current value for DM_PREFETCH_CHUNKS (12) was found to
provide the best performance on a single 15k SCSI spindle. In the
future we may modify this default or make it configurable.

Also, introduce the function dm_bufio_set_minimum_buffers to setup
bufio's number of internal buffers before freeing happens. dm-bufio may
hold more buffers if enough memory is available. There is no guarantee
that the specified number of buffers will be available - if you need a
guarantee, use the argument reserved_buffers for
dm_bufio_client_create.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 55494bf2 13-Jan-2014 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: use dm-bufio

Use dm-bufio for initial loading of the exceptions.
Introduce a new function dm_bufio_forget that frees the given buffer.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 2cadabd5 13-Jan-2014 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: prepare for switch to using dm-bufio

Change the functions get_exception, read_exception and insert_exceptions
so that ps->area is passed as an argument.

This patch doesn't change any functionality, but it refactors the code
to allow for a cleaner switch over to using dm-bufio.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# c1a64160 07-Jan-2014 Chuansheng Liu <chuansheng.liu@intel.com>

dm snapshot: call destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()

In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().

Signed-off-by: Liu, Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# e9c6a182 15-Oct-2013 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: fix data corruption

This patch fixes a particular type of data corruption that has been
encountered when loading a snapshot's metadata from disk.

When we allocate a new chunk in persistent_prepare, we increment
ps->next_free and we make sure that it doesn't point to a metadata area
by further incrementing it if necessary.

When we load metadata from disk on device activation, ps->next_free is
positioned after the last used data chunk. However, if this last used
data chunk is followed by a metadata area, ps->next_free is positioned
erroneously to the metadata area. A newly-allocated chunk is placed at
the same location as the metadata area, resulting in data or metadata
corruption.

This patch changes the code so that ps->next_free skips the metadata
area when metadata are loaded in function read_exceptions.

The patch also moves a piece of code from persistent_prepare_exception
to a separate function skip_metadata to avoid code duplication.

CVE-2013-4299

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 5ea330a7 18-Sep-2013 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: workaround for a false positive lockdep warning

The kernel reports a lockdep warning if a snapshot is invalidated because
it runs out of space.

The lockdep warning was triggered by commit 0976dfc1d0cd80a4e9dfaf87bd87
("workqueue: Catch more locking problems with flush_work()") in v3.5.

The warning is false positive. The real cause for the warning is that
the lockdep engine treats different instances of md->lock as a single
lock.

This patch is a workaround - we use flush_workqueue instead of flush_work.
This code path is not performance sensitive (it is called only on
initialization or invalidation), thus it doesn't matter that we flush the
whole workqueue.

The real fix for the problem would be to teach the lockdep engine to treat
different instances of md->lock as separate locks.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.5+


# daaa5f7c 27-May-2011 Paul Gortmaker <paul.gortmaker@windriver.com>

md: Add in export.h for files using EXPORT_SYMBOL

These files were getting the defines for EXPORT_SYMBOL because
device.h was including module.h. But we are going to put an
end to that. So add the proper export.h include now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>


# a2d2b034 01-Aug-2011 Jonathan Brassow <jbrassow@redhat.com>

dm snapshot: style cleanups

Coding style cleanups.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>


# e29e65aa 01-Aug-2011 Joe Perches <joe@perches.com>

dm: use vzalloc

Use vzalloc() instead of vmalloc()+memset().

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 283a8328 01-Aug-2011 Alasdair G Kergon <agk@redhat.com>

dm: suppress endian warnings

Suppress sparse warnings about cpu_to_le32() by using __le32 types for
on-disk data etc.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 762a80d9 01-Aug-2011 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: flush disk cache when merging

This patch makes dm-snapshot flush disk cache when writing metadata for
merging snapshot.

Without cache flushing the disk may reorder metadata write and other
data writes and there is a possibility of data corruption in case of
power fault.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# bda8efec 29-May-2011 Mikulas Patocka <mpatocka@redhat.com>

dm io: use fixed initial mempool size

Replace the arbitrary calculation of an initial io struct mempool size
with a constant.

The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.

Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 239c8dd5 13-Jan-2011 Tejun Heo <tj@kernel.org>

dm snapshot: persistent make metadata_wq multithreaded

metadata_wq serves on-stack work items from chunk_io(). Even if
multiple chunk_io() are simultaneously in progress, each is
independent and queued only once, so multithreaded workqueue can be
safely used.

Switch metadata_wq to multithread and flush the work item instead of
the workqueue in chunk_io().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 4d4d66ab 13-Jan-2011 Tejun Heo <tj@kernel.org>

dm: convert workqueues to alloc_ordered

Convert all create[_singlethread]_work() users to the new
alloc[_ordered]_workqueue(). This conversion is mechanical and
doesn't introduce any behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# ca1cab37 26-Oct-2010 Andrew Morton <akpm@linux-foundation.org>

workqueues: s/ON_STACK/ONSTACK/

Silly though it is, completions and wait_queue_heads use foo_ONSTACK
(COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK,
__WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I
guess workqueues should do the same thing.

s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/
s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d87f4c14 03-Sep-2010 Tejun Heo <tj@kernel.org>

dm: implement REQ_FLUSH/FUA support for bio-based dm

This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
now deprecated REQ_HARDBARRIER.

* -EOPNOTSUPP handling logic dropped.

* Preflush is handled as before but postflush is dropped and replaced
with passing down REQ_FUA to member request_queues. This replaces
one array wide cache flush w/ member specific FUA writes.

* __split_and_process_bio() now calls __clone_and_map_flush() directly
for flushes and guarantees all FLUSH bio's going to targets are zero
` length.

* It's now guaranteed that all FLUSH bio's which are passed onto dm
targets are zero length. bio_empty_barrier() tests are replaced
with REQ_FLUSH tests.

* Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

* Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
enough to be marked with unlikely().

* Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
capability.

* Request based dm isn't converted yet. dm_init_request_based_queue()
resets flush support to 0 for now. To avoid disturbing request
based dm code, dm->flush_error is added for bio based dm while
requested based dm continues to use dm->barrier_error.

Lightly tested linear, stripe, raid1, snap and crypt targets. Please
proceed with caution as I'm not familiar with the code base.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 87c961cb 11-Aug-2010 Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>

dm snapshot: persistent use define for disk header chunk size

This patch fixes hard-coded value for the size of a chunk that includes
disk header for persistent snapshot. It should be changed to existing
macro NUM_SNAPSHOT_HDR_CHUNKS instead of using hard-coded value 1.

Signed-off-by: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 55f67f2d 16-Feb-2010 Mike Snitzer <snitzer@redhat.com>

dm snapshot: persistent annotate work_queue as on stack

chunk_io() declares its 'struct mdata_req' on the stack and then
initializes its 'struct work_struct' member. Annotate the
initialization of this workqueue with INIT_WORK_ON_STACK to suppress a
debugobjects warning seen when CONFIG_DEBUG_OBJECTS_WORK is enabled.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 4454a621 10-Dec-2009 Mikulas Patocka <mpatocka@redhat.com>

dm exception store: add merge specific methods

Add functions that decide how many consecutive chunks of snapshot to
merge back into the origin next and to update the metadata afterwards.

prepare_merge provides a pointer to the most recent still-to-be-merged
chunk and returns how many previous ones are consecutive and can be
processed together.

commit_merge removes the nr_merged most-recent chunks permanently from
the exception store. The number must not exceed that returned by
prepare_merge.

Introduce NUM_SNAPSHOT_HDR_CHUNKS to show where the snapshot header
chunk is accounted for.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# fc56f6fb 10-Dec-2009 Mike Snitzer <snitzer@redhat.com>

dm snapshot: move cow ref from exception store to snap core

Store the reference to the snapshot cow device in the core snapshot
code instead of each exception store. It can be accessed through the
new function dm_snap_cow(). Exception stores should each now maintain a
reference to their parent snapshot struct.

This is cleaner and makes part of the forthcoming snapshot merge code simpler.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>


# 985903bb 10-Dec-2009 Mike Snitzer <snitzer@redhat.com>

dm snapshot: add allocated metadata to snapshot status

Add number of sectors used by metadata to the end of the snapshot's status
line.

Renamed dm_exception_store_type's 'fraction_full' to 'usage'. Renamed
arguments to be clearer about what is being returned. Also added
'metadata_sectors'.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 1d4989c8 10-Dec-2009 Jon Brassow <jbrassow@redhat.com>

dm snapshot: rename dm_snap_exception to dm_exception

The exception structure is not necessarily just a snapshot
element (especially after we pull it out of dm-snap.c).

Renaming appropriately.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# f5acc834 10-Dec-2009 Jon Brassow <jbrassow@redhat.com>

dm snapshot: avoid else clause in persistent_read_metadata

Minor code touch-up. We don't need the 'else'.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# df96eee6 16-Oct-2009 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: use unsigned integer chunk size

Use unsigned integer chunk size.

Maximum chunk size is 512kB, there won't ever be need to use 4GB chunk size,
so the number can be 32-bit. This fixes compiler failure on 32-bit systems
with large block devices.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# ae0b7448 04-Sep-2009 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: fix on disk chunk size validation

Fix some problems seen in the chunk size processing when activating a
pre-existing snapshot.

For a new snapshot, the chunk size can either be supplied by the creator
or a default value can be used. For an existing snapshot, the
chunk size in the snapshot header on disk should always be used.

If someone attempts to load an existing snapshot and has the 'default
chunk size' option set, the kernel uses its default value even when it
is incorrect for the snapshot being loaded. This patch ensures the
correct on-disk value is always used.

Secondly, when the code does use the chunk size stored on the disk it is
prudent to revalidate it, so the code can exit cleanly if it got
corrupted as happened in
https://bugzilla.redhat.com/show_bug.cgi?id=461506 .

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 61578dcd 04-Sep-2009 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: fix header corruption race on invalidation

If a persistent snapshot fills up, a race can corrupt the on-disk header
which causes a crash on any future attempt to activate the snapshot
(typically while booting). This patch fixes the race.

When the snapshot overflows, __invalidate_snapshot is called, which calls
snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
calls write_header. write_header constructs the new header in the "area"
location.

Concurrently, an existing kcopyd job may finish, call copy_callback
and commit_exception method, that goes to persistent_commit_exception.
persistent_commit_exception doesn't do locking, relying on the fact that
callbacks are single-threaded, but it can race with snapshot invalidation and
overwrite the header that is just being written while the snapshot is being
invalidated.

The result of this race is a corrupted header being written that can
lead to a crash on further reactivation (if chunk_size is zero in the
corrupted header).

The fix is to use separate memory areas for each.

See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 02d2fd31 04-Sep-2009 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: refactor zero_disk_area to use chunk_io

Refactor chunk_io to prepare for the fix in the following patch.

Pass an area pointer to chunk_io and simplify zero_disk_area to use
chunk_io. No functional change.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 2bd02345 22-Jun-2009 Mikulas Patocka <mpatocka@redhat.com>

dm snapshot: use barrier when writing exception store

Send barrier requests when updating the exception area.

Exception area updates need to be ordered w.r.t. data writes, so that
the writes are not reordered in hardware disk cache.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# e1defc4f 22-May-2009 Martin K. Petersen <martin.petersen@oracle.com>

block: Do away with the notion of hardsect_size

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# a32079ce 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm snapshot: persistent fix dtr cleanup

The persistent exception store destructor does not properly
account for all conditions in which it can be called. If it
is called after 'ctr' but before 'read_metadata' (e.g. if
something else in 'snapshot_ctr' fails) then it will attempt
to free areas of memory that haven't been allocated yet.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 1e302a92 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm snapshot: move status to exception store

Let the exception store types print out their status through
the new API, rather than having the snapshot code do it.

Adjust the buffer position to allow for the preceding DMEMIT in the
arguments to type->status().

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 71fab00a 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm snapshot: remove dm_snap header use

Move useful functions out of dm-snap.h and stop using dm-snap.h.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 49beb2b8 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm exception store: move cow pointer

Move COW device from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# d0216849 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm exception store: move chunk_fields

Move chunk fields from snapshot to exception store.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 493df71c 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm exception store: introduce registry

Move exception stores into a registry.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# b2a11465 02-Apr-2009 Jonathan Brassow <jbrassow@redhat.com>

dm exception store: separate type from instance

Introduce struct dm_exception_store_type.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# a159c1ac 05-Jan-2009 Jonathan Brassow <jbrassow@redhat.com>

dm snapshot: extend exception store functions

Supply dm_add_exception as a callback to the read_metadata function.
Add a status function ready for a later patch and name the functions
consistently.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 4db6bfe0 05-Jan-2009 Alasdair G Kergon <agk@redhat.com>

dm snapshot: split out exception store implementations

Move the existing snapshot exception store implementations out into
separate files. Later patches will place these behind a new
interface in preparation for alternative implementations.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>