History log of /linux-master/drivers/md/dm-writecache.c
Revision Date Author Comments
# fa34e589 07-Feb-2024 Mike Snitzer <snitzer@kernel.org>

dm: update relevant MODULE_AUTHOR entries to latest dm-devel mailing list

Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 6e5f0f63 23-Jan-2024 Hongyu Jin <hongyu.jin@unisoc.com>

dm io: Support IO priority

Some IO will dispatch from kworker with different io_context settings
than the submitting task, we may need to specify a priority to avoid
losing priority.

Add IO priority parameter to dm_io() and update all callers.

Co-developed-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Yibin Ding <yibin.ding@unisoc.com>
Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 40ef8756 09-Jan-2024 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: allow allocations larger than 2GiB

The function kvmalloc_node limits the allocation size to INT_MAX. This
limit will be overflowed if dm-writecache attempts to map a device with
1TiB or larger length. This commit changes kvmalloc_array to vmalloc_array
to avoid the limit.

The commit also changes vmalloc(array_size()) to vmalloc_array().

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 3664ff82 09-Apr-2023 Yangtao Li <frank.li@vivo.com>

dm: add helper macro for simple DM target module init and exit

Eliminate duplicate boilerplate code for simple modules that contain
a single DM target driver without any additional setup code.

Add a new module_dm() macro, which replaces the module_init() and
module_exit() with template functions that call dm_register_target()
and dm_unregister_target() respectively.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# b362c733 18-Mar-2023 Yangtao Li <frank.li@vivo.com>

dm: push error reporting down to dm_register_target()

Simplifies each DM target's init method by making dm_register_target()
responsible for its error reporting (on behalf of targets).

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 8d1058fb 07-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: fix use of sizeof() macro

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 1c3fe2fa 07-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: avoid useless 'else' after 'break' or return'

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 2d0f25cb 02-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: remove unnecessary braces from single statement blocks

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 0ef0b471 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing empty lines

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 03b18887 30-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: fix trailing statements

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 43be9c74 30-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: fix undue/missing spaces

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 255e2646 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: address indent/space issues

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# d715fa23 01-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: avoid assignment in if conditions

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 86a3238c 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: change "unsigned" to "unsigned int"

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 3bd94003 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing SPDX-License-Indentifiers

'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.

Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# b7f362d6 08-Aug-2022 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix smatch warning about invalid return from writecache_map

There's a smatch warning "inconsistent returns '&wc->lock'" in
dm-writecache. The reason for the warning is that writecache_map()
doesn't drop the lock on the impossible path.

Fix this warning by adding wc_unlock() after the BUG statement (so
that it will be compiled-away anyway).

Fixes: df699cc16ea5e ("dm writecache: report invalid return from writecache_map helpers")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 2ee73ef6 11-Jul-2022 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: count number of blocks discarded, not number of discard bios

Change dm-writecache, so that it counts the number of blocks discarded
instead of the number of discard bios. Make it consistent with the
read and write statistics counters that were changed to count the
number of blocks instead of bios.

Fixes: e3a35d03407c ("dm writecache: add event counters")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# b2676e14 11-Jul-2022 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: count number of blocks written, not number of write bios

Change dm-writecache, so that it counts the number of blocks written
instead of the number of write bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.

Fixes: e3a35d03407c ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 2c6e755b 11-Jul-2022 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: count number of blocks read, not number of read bios

Change dm-writecache, so that it counts the number of blocks read
instead of the number of read bios. Bios can be split and requeued
using the dm_accept_partial_bio function, so counting bios caused
inaccurate results.

Fixes: e3a35d03407c ("dm writecache: add event counters")
Reported-by: Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 9bc0c92e 11-Jul-2022 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: return void from functions

The functions writecache_map_remap_origin and writecache_bio_copy_ssd
only return a single value, thus they can be made to return void.

This helps simplify the following IO accounting changes.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# ca7dc242 13-Jul-2022 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: set a default MAX_WRITEBACK_JOBS

dm-writecache has the capability to limit the number of writeback jobs
in progress. However, this feature was off by default. As such there
were some out-of-memory crashes observed when lowering the low
watermark while the cache is full.

This commit enables writeback limit by default. It is set to 256MiB or
1/16 of total system memory, whichever is smaller.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 581075e4 14-Jul-2022 Bart Van Assche <bvanassche@acm.org>

dm/core: Reduce the size of struct dm_io_request

Combine the bi_op and bi_op_flags into the bi_opf member. Use the new
blk_opf_t type to improve static type checking. This patch does not
change any functionality.

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-22-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e511c4a3 13-May-2022 Jane Chu <jane.chu@oracle.com>

dax: introduce DAX_RECOVERY_WRITE dax access mode

Up till now, dax_direct_access() is used implicitly for normal
access, but for the purpose of recovery write, dax range with
poison is requested. To make the interface clear, introduce
enum dax_access_mode {
DAX_ACCESS,
DAX_RECOVERY_WRITE,
}
where DAX_ACCESS is used for normal dax access, and
DAX_RECOVERY_WRITE is used for dax recovery write.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 609be106 24-Jan-2022 Christoph Hellwig <hch@lst.de>

block: pass a block_device and opf to bio_alloc_bioset

Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5d2a228b 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dm: make the DAX support depend on CONFIG_FS_DAX

The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax. Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# f635237a 21-Oct-2021 Cai Huoqing <caihuoqing@baidu.com>

dm writecache: Make use of the helper macro kthread_run()

Replace kthread_create/wake_up_process() with kthread_run()
to simplify the code.

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 6dcbb52c 17-Oct-2021 Christoph Hellwig <hch@lst.de>

dm: use bdev_nr_sectors and bdev_nr_bytes instead of open coding them

Use the proper helpers to read the block device size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211018101130.1838532-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8ec45662 12-Jul-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

dm: update target status functions to support IMA measurement

For device mapper targets to take advantage of IMA's measurement
capabilities, the status functions for the individual targets need to be
updated to handle the status_type_t case for value STATUSTYPE_IMA.

Update status functions for the following target types, to log their
respective attributes to be measured using IMA.
01. cache
02. crypt
03. integrity
04. linear
05. mirror
06. multipath
07. raid
08. snapshot
09. striped
10. verity

For rest of the targets, handle the STATUSTYPE_IMA case by setting the
measurement buffer to NULL.

For IMA to measure the data on a given system, the IMA policy on the
system needs to be updated to have the following line, and the system
needs to be restarted for the measurements to take effect.

/etc/ima/ima-policy
measure func=CRITICAL_DATA label=device-mapper template=ima-buf

The measurements will be reflected in the IMA logs, which are located at:

/sys/kernel/security/integrity/ima/ascii_runtime_measurements
/sys/kernel/security/integrity/ima/binary_runtime_measurements

These IMA logs can later be consumed by various attestation clients
running on the system, and send them to external services for attesting
the system.

The DM target data measured by IMA subsystem can alternatively
be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
DM_TABLE_STATUS_CMD.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# e3a35d03 27-Jul-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: add event counters

Add 10 counters for various events (hit, miss, etc) and export them in
the status line (accessed from userspace with "dmsetup status"). Also
add a message "clear_stats" that resets these counters.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# df699cc1 27-Jul-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: report invalid return from writecache_map helpers

If some "writecache_map_*" function returns invalid state, it is a bug.
So, we should report it and not fail silently.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 15cb6f39 12-Jul-2021 Mike Snitzer <snitzer@redhat.com>

dm writecache: further writecache_map() cleanup

Factor out writecache_map_flush() and writecache_map_discard() from
writecache_map(). Also eliminate the various goto labels in
writecache_map().

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4d020b3a 12-Jul-2021 Mike Snitzer <snitzer@redhat.com>

dm writecache: factor out writecache_map_remap_origin()

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# cdd4d783 12-Jul-2021 Mike Snitzer <snitzer@redhat.com>

dm writecache: split up writecache_map() to improve code readability

writecache_map() has grown too large and can be confusing to read given
all the goto statements.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 18a6234c 26-Jul-2021 Christoph Hellwig <hch@lst.de>

dm-writecache: use bvec_kmap_local instead of bvec_kmap_irq

There is no need to disable interrupts in bio_copy_block, and the local
only mappings helps to avoid any sort of problems with stray writes
into the bio data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20210727055646.118787-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5c0de3d7 28-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: make writeback pause configurable

Commit 95b88f4d71cb953e02206be3c757083601391a0f ("dm writecache: pause
writeback if cache full and origin being written directly") introduced a
code that pauses cache flushing if we are issuing writes directly to the
origin.

Improve that initial commit by making the timeout code configurable
(via the option "pause_writeback"). Also change the default from 1s to
3s because it performed better.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 95b88f4d 25-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: pause writeback if cache full and origin being written directly

Implementation reuses dm_io_tracker, that until now was only used by
dm-cache, to track if any writes were issued directly to the origin
(due to cache being full) within the last second. If so writeback is
paused for a second.

This change improves performance for when the cache is full and IO is
issued directly to the origin device (rather than through the cache).

Depends-on: d53f1fafec9d ("dm writecache: do direct write if the cache is full")
Suggested-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 611c3e16 21-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: add optional "metadata_only" parameter

Add a "metadata_only" parameter that when present: only metadata is
promoted to the cache. This option improves performance for heavier
REQ_META workloads (e.g. device-mapper-test-suite's "git clone and
checkout" benchmark improves from 341s to 312s).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 867de40c 21-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: write at least 4k when committing

SSDs perform badly with sub-4k writes (because they perfrorm
read-modify-write internally), so make sure writecache writes at least
4k when committing.

Fixes: 991bd8d7bc78 ("dm writecache: commit just one block, not a full page")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ee55b92a 15-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: flush origin device when writing and cache is full

Commit d53f1fafec9d086f1c5166436abefdaef30e0363 ("dm writecache: do
direct write if the cache is full") changed dm-writecache, so that it
writes directly to the origin device if the cache is full.
Unfortunately, it doesn't forward flush requests to the origin device,
so that there is a bug where flushes are being ignored.

Fix this by adding missing flush forwarding.

For PMEM mode, we fix this bug by disabling direct writes to the origin
device, because it performs better.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: d53f1fafec9d ("dm writecache: do direct write if the cache is full")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 293128b1 15-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: have ssd writeback wait if the kcopyd workqueue is busy

Make dm-writecache wait if the kcopyd workqueue is busy (as will
happen if waiting for page allocation or inside submit_bio).

This change improves performance of "mkfs.ext2" by approximately 20%
on one testbed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 8c77f1cb 09-Jun-2021 Baokun Li <libaokun1@huawei.com>

dm writecache: use list_move instead of list_del/list_add in writecache_writeback()

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 991bd8d7 06-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: commit just one block, not a full page

Some architectures have pages larger than 4k and committing a full
page causes needless overhead.

Fix this by writing a single block when committing the superblock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 620cbe40 06-Jun-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: remove unused gfp_t argument from wc_add_block()

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# af4f6cab 26-May-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: interrupt writeback if suspended

If the DM device is suspended, interrupt the writeback sequence so
that there is no excessive suspend delay.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ee50cc19 26-May-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: don't split bios when overwriting contiguous cache content

If dm-writecache overwrites existing cached data, it splits the
incoming bio into many block-sized bios. The I/O scheduler does merge
these bios into one large request but this needless splitting and
merging causes performance degradation.

Fix this by avoiding bio splitting if the cache target area that is
being overwritten is contiguous.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# c40819f2 10-Feb-2021 Julia Lawall <julia.lawall@inria.fr>

dm writecache: fix flexible_array.cocci warnings

Zero-length and one-element arrays are deprecated, see
Documentation/process/deprecated.rst
Flexible-array members should be used instead.

Generated by: scripts/coccinelle/misc/flexible_array.cocci

CC: Denis Efremov <efremov@linux.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# a8affc03 10-Mar-2021 Christoph Hellwig <hch@lst.de>

block: rename BIO_MAX_PAGES to BIO_MAX_VECS

Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d9928ac5 09-Feb-2021 Mike Snitzer <snitzer@redhat.com>

dm writecache: use bdev_nr_sectors() instead of open-coded equivalent

Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4134455f 09-Feb-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix writing beyond end of underlying device when shrinking

Do not attempt to write any data beyond the end of the underlying data
device while shrinking it.

The DM writecache device must be suspended when the underlying data
device is shrunk.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 054bee16 04-Feb-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: return the exact table values that were set

LVM doesn't like it when the target returns different values from what
was set in the constructor. Fix dm-writecache so that the returned
table values are exactly the same as requested values.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 21ec672e 25-Jan-2021 Tian Tao <tiantao6@hisilicon.com>

dm writecache: fix unnecessary NULL check warnings

Remove NULL checks before vfree() to fix these warnings:
./drivers/md/dm-writecache.c:2008:2-7: WARNING: NULL check before some
freeing functions is not needed.
./drivers/md/dm-writecache.c:2024:2-7: WARNING: NULL check before some
freeing functions is not needed.

Signed-off-by: Tian Tao <tiantao6@hisilicon.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# cb728484 23-Jan-2021 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix performance degradation in ssd mode

Fix a thinko in ssd_commit_superblock. region.count is in sectors, not
bytes. This bug doesn't corrupt data, but it causes performance
degradation.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: dc8a01ae1dbd ("dm writecache: optimize superblock write")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 857c4c0a 13-Nov-2020 Mike Snitzer <snitzer@redhat.com>

dm writecache: remove BUG() and fail gracefully instead

Building on arch/s390/ results in this build error:

cc1: some warnings being treated as errors
../drivers/md/dm-writecache.c: In function 'persistent_memory_claim':
../drivers/md/dm-writecache.c:323:1: error: no return statement in function returning non-void [-Werror=return-type]

Fix this by replacing the BUG() with an -EOPNOTSUPP return.

Fixes: 48debafe4f2f ("dm: add writecache target")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 67aa3ec3 10-Nov-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix the maximum number of arguments

Advance the maximum number of arguments to 16.
This fixes issue where certain operations, combined with table
configured args, exceed 10 arguments.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# e5d41cbc 10-Nov-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: advance the number of arguments when reporting max_age

When reporting the "max_age" value the number of arguments must
advance by two.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 3923d4854e18 ("dm writecache: implement gradual cleanup")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ec6347bb 05-Oct-2020 Dan Williams <dan.j.williams@intel.com>

x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()

In reaction to a proposal to introduce a memcpy_mcsafe_fast()
implementation Linus points out that memcpy_mcsafe() is poorly named
relative to communicating the scope of the interface. Specifically what
addresses are valid to pass as source, destination, and what faults /
exceptions are handled.

Of particular concern is that even though x86 might be able to handle
the semantics of copy_mc_to_user() with its common copy_user_generic()
implementation other archs likely need / want an explicit path for this
case:

On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > However now I see that copy_user_generic() works for the wrong reason.
> > It works because the exception on the source address due to poison
> > looks no different than a write fault on the user address to the
> > caller, it's still just a short copy. So it makes copy_to_user() work
> > for the wrong reason relative to the name.
>
> Right.
>
> And it won't work that way on other architectures. On x86, we have a
> generic function that can take faults on either side, and we use it
> for both cases (and for the "in_user" case too), but that's an
> artifact of the architecture oddity.
>
> In fact, it's probably wrong even on x86 - because it can hide bugs -
> but writing those things is painful enough that everybody prefers
> having just one function.

Replace a single top-level memcpy_mcsafe() with either
copy_mc_to_user(), or copy_mc_to_kernel().

Introduce an x86 copy_mc_fragile() name as the rename for the
low-level x86 implementation formerly named memcpy_mcsafe(). It is used
as the slow / careful backend that is supplanted by a fast
copy_mc_generic() in a follow-on patch.

One side-effect of this reorganization is that separating copy_mc_64.S
to its own file means that perf no longer needs to track dependencies
for its memcpy_64.S benchmarks.

[ bp: Massage a bit. ]

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <stable@vger.kernel.org>
Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com


# f9e040ef 24-Aug-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: handle DAX to partitions on persistent memory correctly

The function dax_direct_access doesn't take partitions into account,
it always maps pages from the beginning of the device. Therefore,
persistent_memory_claim() must get the partition offset using
get_start_sect() and add it to the page offsets passed to
dax_direct_access().

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# 3e79f082 30-Jun-2020 Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

Architectures like ppc64 provide persistent memory specific barriers
that will ensure that all stores for which the modifications are
written to persistent storage by preceding dcbfps and dcbstps
instructions have updated persistent storage before any data
access or data transfer caused by subsequent instructions is initiated.
This is in addition to the ordering done by wmb()

Update nvdimm core such that architecture can use barriers other than
wmb to ensure all previous writes are architecturally visible for
the platform buffer flush.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200701072235.223558-5-aneesh.kumar@linux.ibm.com


# a4662458 30-Jun-2020 Michal Suchanek <msuchanek@suse.de>

dm writecache: reject asynchronous pmem devices

DM writecache does not handle asynchronous pmem. Reject it when
supplied as cache.

Link: https://lore.kernel.org/linux-nvdimm/87lfk5hahc.fsf@linux.ibm.com/
Fixes: 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # 5.3+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ed00aabd 01-Jul-2020 Christoph Hellwig <hch@lst.de>

block: rename generic_make_request to submit_bio_noacct

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d35bd764 19-Jun-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: add cond_resched to loop in persistent_memory_claim()

Add cond_resched() to a loop that fills in the mapper memory area
because the loop can be executed many times.

Fixes: 48debafe4f2fe ("dm: add writecache target")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# a143e172 12-Jun-2020 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: skip writecache_wait when using pmem mode

The array bio_in_progress is only used with ssd mode. So skip
writecache_wait_for_ios in writecache_discard when pmem mode.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 39495b12 12-Jun-2020 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: correct uncommitted_block when discarding uncommitted entry

When uncommitted entry has been discarded, correct wc->uncommitted_block
for getting the exact number.

Fixes: 48debafe4f2fe ("dm: add writecache target")
Cc: stable@vger.kernel.org
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 48338daa 28-Apr-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: improve performance on DDR persistent memory (Optane)

When testing the dm-writecache target on a real DDR persistent memory
(Intel Optane), it turned out that explicit cache flushing using the
clflushopt instruction performs better than non-temporal stores for
block sizes 1k, 2k and 4k.

The dm-writecache target is singlethreaded (all the copying is done
while holding the writecache lock), so it benefits from clwb, see:
http://lore.kernel.org/r/alpine.LRH.2.02.2004160411460.7833@file01.intranet.prod.int.rdu2.redhat.com

Add a new function memcpy_flushcache_optimized() that tests if
clflushopt is present - and if it is, we use it instead of
memcpy_flushcache.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 499c1804 19-Apr-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: remove superfluous test in persistent_memory_claim

Remove superfluous test if dax_dev is NULL - dax_direct_access already
does this test.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 31b22120 15-Apr-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix data corruption when reloading the target

The dm-writecache reads metadata in the target constructor. However, when
we reload the target, there could be another active instance running on
the same device. This is the sequence of operations when doing a reload:

1. construct new target
2. suspend old target
3. resume new target
4. destroy old target

Metadata that were written by the old target between steps 1 and 2 would
not be visible by the new target.

Fix the data corruption by loading the metadata in the resume handler.

Also, validate block_size is at least as large as both the devices'
logical block size and only read 1 block from the metadata during
target constructor -- no need to read entirety of metadata now that it
is done during resume.

Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 1edaa447 27-Mar-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: add cond_resched to avoid CPU hangs

Initializing a dm-writecache device can take a long time when the
persistent memory device is large. Add cond_resched() to a few loops
to avoid warnings that the CPU is stuck.

Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# dc8a01ae 24-Feb-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: optimize superblock write

If we write a superblock in writecache_flush, we don't need to set bit and
scan the bitmap for it - we can just write the superblock directly. Also,
we can set the flag REQ_FUA on the write bio, so that we don't need to
submit a flush bio afterwards.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 3923d485 24-Feb-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: implement gradual cleanup

If a block is stored in the cache for too long, it will now be
written to the underlying device and cleaned up.

Add a new option "max_age" that specifies the maximum age of a block
in milliseconds.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 93de44eb 24-Feb-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: implement the "cleaner" policy

The "flush" or "flush_on_suspend" messages flush the whole cache. However,
these flushing methods can take some time and the process is left in
an interruptible state during the flush.

Implement a "cleaner" option that offers an alternate flushing method.
When this option is activated (either by a message or in the constructor
arguments), the cache will not promote new writes (however, writes to
already cached blocks are promoted, to avoid data corruption due to
misordered writes) and it will gradually writeback any cached data. The
userspace can then monitor the cleaning process with "dmsetup status".
When the number of cached bloks drops to zero, the userspace can unload
the dm-writecache target and replace it with dm-linear or other targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# d53f1faf 24-Feb-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: do direct write if the cache is full

If the cache device is full, we do a direct write to the origin device.
Note that we must not do it if the written block is already in the cache.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 636be424 27-Feb-2020 Mike Snitzer <snitzer@redhat.com>

dm: bump version of core and various targets

Changes made during the 5.6 cycle warrant bumping the version number
for DM core and the targets modified by this commit.

It should be noted that dm-thin, dm-crypt and dm-raid already had
their target version bumped during the 5.6 merge window.

Signed-off-by; Mike Snitzer <snitzer@redhat.com>


# 41c526c5 24-Feb-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: verify watermark during resume

Verify the watermark upon resume - so that if the target is reloaded
with lower watermark, it will start the cleanup process immediately.

Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# adc0daad 24-Feb-2020 Mikulas Patocka <mpatocka@redhat.com>

dm: report suspended device during destroy

The function dm_suspended returns true if the target is suspended.
However, when the target is being suspended during unload, it returns
false.

An example where this is a problem: the test "!dm_suspended(wc->ti)" in
writecache_writeback is not sufficient, because dm_suspended returns
zero while writecache_suspend is in progress. As is, without an
enhanced dm_suspended, simply switching from flush_workqueue to
drain_workqueue still emits warnings:
workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries
workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries

writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function
flushes the current work. However, the workqueue may re-queue itself and
flush_workqueue doesn't wait for re-queued works to finish. Because of
this - the function writecache_writeback continues execution after the
device was suspended and then concurrently with writecache_dtr, causing
a crash in writecache_writeback.

We must use drain_workqueue - that waits until the work and all re-queued
works finish.

As a prereq for switching to drain_workqueue, this commit fixes
dm_suspended to return true after the presuspend hook and before the
postsuspend hook - just like during a normal suspend. It allows
simplifying the dm-integrity and dm-writecache targets so that they
don't have to maintain suspended flags on their own.

With this change use of drain_workqueue() can be used effectively. This
change was tested with the lvm2 testsuite and cryptsetup testsuite and
the are no regressions.

Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18+
Reported-by: Corey Marthaler <cmarthal@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# dcd19507 15-Jan-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: improve performance of large linear writes on SSDs

When dm-writecache is used with SSD as a cache device, it would submit a
separate bio for each written block. The I/Os would be merged by the disk
scheduler, but this merging degrades performance.

Improve dm-writecache performance by submitting larger bios - this is
possible as long as there is consecutive free space on the cache
device.

Benchmark (arm64 with 64k page size, using /dev/ram0 as a cache device):

fio --bs=512k --iodepth=32 --size=400M --direct=1 \
--filename=/dev/mapper/cache --rw=randwrite --numjobs=1 --name=test

block old new
size MiB/s MiB/s
---------------------
512 181 700
1k 347 1256
2k 644 2020
4k 1183 2759
8k 1852 3333
16k 2469 3509
32k 2974 3670
64k 3404 3810

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# aa950920 08-Jan-2020 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix incorrect flush sequence when doing SSD mode commit

When committing state, the function writecache_flush does the following:
1. write metadata (writecache_commit_flushed)
2. flush disk cache (writecache_commit_flushed)
3. wait for data writes to complete (writecache_wait_for_ios)
4. increase superblock seq_count
5. write the superblock
6. flush disk cache

It may happen that at step 3, when we wait for some write to finish, the
disk may report the write as finished, but the write only hit the disk
cache and it is not yet stored in persistent storage. At step 5 we write
the superblock - it may happen that the superblock is written before the
write that we waited for in step 3. If the machine crashes, it may result
in incorrect data being returned after reboot.

In order to fix the bug, we must swap steps 2 and 3 in the above sequence,
so that we first wait for writes to complete and then flush the disk
cache.

Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# c1005322 23-Oct-2019 Maged Mokhtar <mmokhtar@petasan.org>

dm writecache: handle REQ_FUA

Call writecache_flush() on REQ_FUA in writecache_map().

Cc: stable@vger.kernel.org # 4.18+
Signed-off-by: Maged Mokhtar <mmokhtar@petasan.org>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 8dd85873 02-Oct-2019 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix uninitialized variable warning

This fixes coverity warning CID 1454301.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 6d195913 02-Sep-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: skip writecache_wait for pmem mode

The array bio_in_progress[2] only have chance to be increased and
decreased with ssd mode. For pmem mode, they are not involved at all.
So skip writecache_wait_for_ios in writecache_flush for pmem.

Suggested-by: Doris Yu <tyu1@lenovo.com>
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 5229b489 25-Aug-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: optimize performance by sorting the blocks for writeback_all

During the process of writeback, the blocks, which have been placed in wbl.list
for writeback soon, are partially ordered for the contiguous ones.

When writeback_all has been set, for most cases, also by default, there will be
a lot of blocks in pmem need to writeback at the same time.
For this case, we could optimize the performance by sorting all blocks in
wbl.list. writecache_writeback doesn't need to get blocks from the tail of
wc->lru, whereas from the first rb_node from the rb_tree.

The benefit is that, writecache_writeback doesn't need to have any cost to sort
the blocks, because of all blocks are incremental originally in rb_tree.
There will be a writecache_flush when writeback_all begins to work, that will
eliminate duplicate blocks in cache by committed/uncommitted.

Testing platform: Thinksystem SR630 with persistent memory.
The cache comes from pmem, which has 1006MB size. The origin device is HDD, 2GB
of which for using.

Testing steps:
1) dmsetup create mycache --table '0 4194304 writecache p /dev/sdb1 /dev/pmem4 4096 0'
2) fio -filename=/dev/mapper/mycache -direct=1 -iodepth=20 -rw=randwrite
-ioengine=libaio -bs=4k -loops=1 -size=2g -group_reporting -name=mytest1
3) time dmsetup message /dev/mapper/mycache 0 flush

Here is the results below,
With the patch:
# fio -filename=/dev/mapper/mycache -direct=1 -iodepth=20 -rw=randwrite
-ioengine=libaio -bs=4k -loops=1 -size=2g -group_reporting -name=mytest1
iops : min= 1582, max=199470, avg=5305.94, stdev=21273.44, samples=197
# time dmsetup message /dev/mapper/mycache 0 flush
real 0m44.020s
user 0m0.002s
sys 0m0.003s

Without the patch:
# fio -filename=/dev/mapper/mycache -direct=1 -iodepth=20 -rw=randwrite
-ioengine=libaio -bs=4k -loops=1 -size=2g -group_reporting -name=mytest1
iops : min= 1202, max=197650, avg=4968.67, stdev=20480.17, samples=211
# time dmsetup message /dev/mapper/mycache 0 flush
real 1m39.221s
user 0m0.001s
sys 0m0.003s

I also have checked the data accuracy with this patch by making EXT4 filesystem
on mycache, then mount it for checking md5 of files on that.
The test result is positive, with this patch it could save more than half of time
when writeback_all.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 62421b38 25-Aug-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: add unlikely for getting two block with same LBA

In function writecache_writeback, entries g and f has same original
sector only happens at entry f has been committed, but entry g has
NOT yet.

The probability of this happening is very low in the following
256 blocks at most of entry e.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 58912dbc 25-Aug-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: remove unused member pointer in writeback_struct

The stucture member pointer page in writeback_struct never has been
used actually. Remove it.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# f8011d33 26-Apr-2019 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: avoid unnecessary lookups in writecache_find_entry()

This is a small optimization in writecache_find_entry().

If we go past the condition "if (unlikely(!node))", we can be certain that
there is no entry in the tree that has the block equal to the "block"
variable.

Consequently, we can return the next entry directly, we don't need to go
to the second part of the function that finds the entry with lowest or
highest seq number that matches the "block" variable.

Also, add some whitespace and cleanup needless braces.

Suggested-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 08a8e804 25-Apr-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: remove unused member page_offset in writeback_struct

The stucture member page_offset in writeback_struct never has been
used actually. Remove it.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 84420b1e 12-Apr-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: add unlikely for returned value of rb_next/prev

In functions writecache_discard() and writecache_find_entry() there is a
high probablity that the pointer of structure rb_node won't equal NULL.
Add unlikely for the pointer node NULL.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 09f2d656 12-Apr-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: remove needless dereferences in __writecache_writeback_pmem()

bio is already available so there is no need to access it in terms of
the wb pointer.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# f87e033b 20-Feb-2019 Huaisheng Ye <yehs1@lenovo.com>

dm writecache: fix typo in name for writeback_wq

The workqueue's name should be "writecache-writeback" instead of
"writecache-writeabck".

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# e8ea141a 24-Oct-2018 Shenghui Wang <shhuiw@foxmail.com>

dm writecache: fix typo in error msg for creating writecache_flush_thread

The error msg should be "flush thread" instead of "endio thread"
for writecache_flush_thread.

Signed-off-by: Shenghui Wang <shhuiw@foxmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# da4ad3a2 22-Oct-2018 Mike Snitzer <snitzer@redhat.com>

dm writecache: remove disabled code in memory_entry()

This dead branch was missed during review. It only makes memory_entry()
more inefficient due to needless call to is_power_of_2(), etc.

Reported-by: shenghui <shhuiw@foxmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 1e1132ea 15-Aug-2018 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: fix a crash due to reading past end of dirty_bitmap

wc->dirty_bitmap_size is in bytes so must multiply it by 8, not by
BITS_PER_LONG, to get number of bitmap_bits.

Fixes crash in find_next_bit() that was reported:
https://bugzilla.kernel.org/show_bug.cgi?id=200819

Reported-by: edo.rus@gmail.com
Fixes: 48debafe4f2f ("dm: add writecache target")
Cc: stable@vger.kernel.org # 4.18
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# f742267a 30-Jul-2018 Huaisheng Ye <yehs1@lenovo.com>

md/dm-writecache: Don't request pointer dummy_addr when not required

Function persistent_memory_claim doesn't need to get local pointer
dummy_addr from direct_access. Using NULL instead of having to pass
in a useless local pointer that caller then just throw away.

Suggested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>


# 9ff07e7d 25-Jul-2018 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: report start_sector in status line

Fixes: d284f8248c7 ("dm writecache: support optional offset for start of device")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# d284f824 28-Jun-2018 Mikulas Patocka <mpatocka@redhat.com>

dm writecache: support optional offset for start of device

Add an optional parameter "start_sector" to allow the start of the
device to be offset by the specified number of 512-byte sectors. The
sectors below this offset are not used by the writecache device and are
left to be used for disk labels and/or userspace metadata (e.g. lvm).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 50a7d3ba 18-Jun-2018 Kees Cook <keescook@chromium.org>

dm writecache: use 2-factor allocator arguments

This adjusts the allocator calls to use the 2-factor argument style, as
already done treewide for better defense against allocator overflows.

Signed-off-by: Kees Cook <keescook@chromium.org>
[snitzer: tweaked code to leave assignment in a test alone]
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 48debafe 08-Mar-2018 Mikulas Patocka <mpatocka@redhat.com>

dm: add writecache target

The writecache target caches writes on persistent memory or SSD.
It is intended for databases or other programs that need extremely low
commit latency.

The writecache target doesn't cache reads because reads are supposed to
be cached in page cache in normal RAM.

If persistent memory isn't available this target can still be used in
SSD mode.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com> # fix missing goto
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> # fix compilation issue with !DAX
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # use msecs_to_jiffies
Acked-by: Dan Williams <dan.j.williams@intel.com> # reworks to unify ARM and x86 flushing
Signed-off-by: Mike Snitzer <msnitzer@redhat.com>