History log of /linux-master/drivers/md/dm-stripe.c
Revision Date Author Comments
# f1445032 18-Sep-2023 Mikulas Patocka <mpatocka@redhat.com>

dm: shortcut the calls to linear_map and stripe_map

Shortcut the calls to linear_map and stripe_map, so that they don't suffer
the overhead of retpolines used for indirect calls.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 694b3b9d 15-Sep-2023 Kees Cook <keescook@chromium.org>

dm: Annotate struct stripe_c with __counted_by

Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).

As found with Coccinelle[1], add __counted_by for struct stripe_c.

[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci

Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: dm-devel@redhat.com
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20230915200352.never.118-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>


# 26cb62a2 16-Mar-2023 Yu Zhe <yuzhe@nfschina.com>

dm: remove unnecessary (void*) conversions

Pointer variables of void * type do not require type cast.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# a7e8f7fb 19-Apr-2022 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

dm: update targets using system workqueues to use a local workqueue

Flushing system-wide workqueues is dangerous and will be forbidden.
Use a local workqueue in dm-mpath.c, dm-raid1.c, and dm-stripe.c.

Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 1c3fe2fa 07-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: avoid useless 'else' after 'break' or return'

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 2e84fecf 03-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: avoid split of quoted strings where possible

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 2d0f25cb 02-Feb-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: remove unnecessary braces from single statement blocks

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 86a3238c 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: change "unsigned" to "unsigned int"

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 3bd94003 25-Jan-2023 Heinz Mauelshagen <heinzm@redhat.com>

dm: add missing SPDX-License-Indentifiers

'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has
deprecated its use.

Suggested-by: John Wiele <jwiele@redhat.com>
Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>


# 047218ec 22-Apr-2022 Jane Chu <jane.chu@oracle.com>

dax: add .recovery_write dax_operation

Introduce dax_recovery_write() operation. The function is used to
recover a dax range that contains poison. Typical use case is when
a user process receives a SIGBUS with si_code BUS_MCEERR_AR
indicating poison(s) in a dax range, in response, the user process
issues a pwrite() to the page-aligned dax range, thus clears the
poison and puts valid data in the range.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e511c4a3 13-May-2022 Jane Chu <jane.chu@oracle.com>

dax: introduce DAX_RECOVERY_WRITE dax access mode

Up till now, dax_direct_access() is used implicitly for normal
access, but for the purpose of recovery write, dax range with
poison is requested. To make the interface clear, introduce
enum dax_access_mode {
DAX_ACCESS,
DAX_RECOVERY_WRITE,
}
where DAX_ACCESS is used for normal dax access, and
DAX_RECOVERY_WRITE is used for dax recovery write.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# a773187e 09-Feb-2022 Christoph Hellwig <hch@lst.de>

scsi: dm: Remove WRITE_SAME support

There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# 7ac5360c 15-Dec-2021 Christoph Hellwig <hch@lst.de>

dax: remove the copy_from_iter and copy_to_iter methods

These methods indirect the actual DAX read/write path. In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.

Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
routines are used to remove indirect call from the read/write fast path
as well as a lot of boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 2a68553e 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dm-stripe: add a stripe_dax_pgoff helper

Add a helper to perform the entire remapping for DAX accesses. This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-12-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 5d2a228b 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dm: make the DAX support depend on CONFIG_FS_DAX

The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax. Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 8ec45662 12-Jul-2021 Tushar Sugandhi <tusharsu@linux.microsoft.com>

dm: update target status functions to support IMA measurement

For device mapper targets to take advantage of IMA's measurement
capabilities, the status functions for the individual targets need to be
updated to handle the status_type_t case for value STATUSTYPE_IMA.

Update status functions for the following target types, to log their
respective attributes to be measured using IMA.
01. cache
02. crypt
03. integrity
04. linear
05. mirror
06. multipath
07. raid
08. snapshot
09. striped
10. verity

For rest of the targets, handle the STATUSTYPE_IMA case by setting the
measurement buffer to NULL.

For IMA to measure the data on a given system, the IMA policy on the
system needs to be updated to have the following line, and the system
needs to be restarted for the measurements to take effect.

/etc/ima/ima-policy
measure func=CRITICAL_DATA label=device-mapper template=ima-buf

The measurements will be reflected in the IMA logs, which are located at:

/sys/kernel/security/integrity/ima/ascii_runtime_measurements
/sys/kernel/security/integrity/ima/binary_runtime_measurements

These IMA logs can later be consumed by various attestation clients
running on the system, and send them to external services for attesting
the system.

The DM target data measured by IMA subsystem can alternatively
be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with
DM_TABLE_STATUS_CMD.

Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 410fe220 12-Nov-2020 Jeffle Xu <jefflexu@linux.alibaba.com>

dm: add support for REQ_NOWAIT to various targets

commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added a new queue
flag QUEUE_FLAG_NOWAIT to advertise if the bdev supports handling of
REQ_NOWAIT or not. DM core supports stacking QUEUE_FLAG_NOWAIT since
commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable it for
linear target"), in which only dm-linear enabled it.

Update others DM targets, which just do simple remapping, to enable
support for REQ_NOWAIT.

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# b18ae8dd 07-May-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

dm: replace zero-length array with flexible-array

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
int stuff;
struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# cdf6cdcd 28-Feb-2020 Vivek Goyal <vgoyal@redhat.com>

dm,dax: Add dax zero_page_range operation

This patch adds support for dax zero_page_range operation to dm targets.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 8adeac3b 02-Oct-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

dm stripe: use struct_size() in kmalloc()

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct stripe_c {
...
struct stripe stripe[0];
};

In this case alloc_context() and dm_array_too_big() are removed and
replaced by the direct use of the struct_size() helper in kmalloc().

Notice that open-coded form is prone to type mistakes.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# b3a9a0c3 02-May-2018 Dan Williams <dan.j.williams@intel.com>

dax: Introduce a ->copy_to_iter dax operation

Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 00716545 13-Mar-2018 Denis Semakin <d.semakin@omprussia.ru>

dm: add support for secure erase forwarding

Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's
data devices support secure erase.

Also, add support for secure erase to both the linear and striped
targets.

Signed-off-by: Denis Semakin <d.semakin@omprussia.ru>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 706dd22f 08-Mar-2018 Tycho Andersen <tycho@tycho.pizza>

dm stripe: get rid of a Variable Length Array (VLA)

Ideally, we'd like to get rid of all VLAs in the kernel and add -Wvla to
the build args: https://lkml.org/lkml/2018/3/7/621

This one is a simple case, since we don't actually need the VLA at all: we
can just iterate over the stripes twice, once to emit their names, and the
second time to emit status (i.e. trade memory for time). Since the number
of stripes is probably low, this is hopefully not that expensive.

Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 976431b0 29-Mar-2018 Dan Williams <dan.j.williams@intel.com>

dax, dm: allow device-mapper to operate without dax support

Change device-mapper's DAX dependency to require the presence of at
least one DAX_DRIVER. This allows device-mapper to be built without
bringing the DAX core along which is especially wasteful when there are
no DAX drivers, like BLK_DEV_PMEM, configured.

Cc: Alasdair Kergon <agk@redhat.com>
Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# c3ca015f 31-Aug-2017 Mikulas Patocka <mpatocka@redhat.com>

dax: remove the pmem_dax_ops->flush abstraction

Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 74d46992 23-Aug-2017 Christoph Hellwig <hch@lst.de>

block: replace bi_bdev with a gendisk pointer and partitions index

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# abebfbe2 29-May-2017 Dan Williams <dan.j.williams@intel.com>

dm: add ->flush() dax operation support

Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying flush implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 7e026c8c 29-May-2017 Dan Williams <dan.j.williams@intel.com>

dm: add ->copy_from_iter() dax operation support

Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 4e4cbee9 03-Jun-2017 Christoph Hellwig <hch@lst.de>

block: switch bios to blk_status_t

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 1be56909 03-Jun-2017 Christoph Hellwig <hch@lst.de>

dm: change ->end_io calling convention

Turn the error paramter into a pointer so that target drivers can change
the value, and make sure only DM_ENDIO_* values are returned from the
methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 9966afaf 03-Jun-2017 Christoph Hellwig <hch@lst.de>

dm: fix REQ_RAHEAD handling

A few (but not all) dm targets use a special EWOULDBLOCK error code for
failing REQ_RAHEAD requests that fail due to a lack of available resources.
But no one else knows about this magic code, and lower level drivers also
don't generate it when failing read-ahead requests for similar reasons.

So remove this special casing and ignore all additional error handling for
REQ_RAHEAD - if this was a real underlying error we'd get a normal read
once the real read comes in.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 817bf402 12-Apr-2017 Dan Williams <dan.j.williams@intel.com>

dm: teach dm-targets to use a dax_device + dax_operations

Arrange for dm to lookup the dax services available from member devices.
Update the dax-capable targets, linear and stripe, to route dax
operations to the underlying device. Changes the target-internal
->direct_access() method to more closely align with the dax_operations
->direct_access() calling convention.

Cc: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e2460f2a 18-Apr-2017 Mikulas Patocka <mpatocka@redhat.com>

dm: mark targets that pass integrity data

A dm-crypt on dm-integrity device incorrectly advertises an integrity
profile on the DM crypt device. It can be seen in the files
"/sys/block/dm-*/integrity/*" that both dm-integrity and dm-crypt target
advertise the integrity profile. That is incorrect, only the
dm-integrity target should advertise the integrity profile.

A general problem in DM is that if we have a DM device that depends on
another device with an integrity profile, the upper device will always
advertise the integrity profile, even when the target driver doesn't
support handling integrity data.

Most targets don't support integrity data, so we provide a whitelist of
targets that support it (linear, delay and striped). The targets that
support passing integrity data to the lower device are marked with the
flag DM_TARGET_PASSES_INTEGRITY. The DM core will now advertise
integrity data on a DM device only if all the targets support the
integrity data.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# ac62d620 05-Apr-2017 Christoph Hellwig <hch@lst.de>

dm: support REQ_OP_WRITE_ZEROES

Copy & paste from the REQ_OP_WRITE_SAME code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 1eff9d32 05-Aug-2016 Jens Axboe <axboe@fb.com>

block: rename bio bi_rw to bi_opf

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe <axboe@fb.com>


# beec25b4 24-Jun-2016 Toshi Kani <toshi.kani@hpe.com>

dm stripe: add DAX support

Change dm-stripe to implement direct_access function,
stripe_direct_access(), which maps bdev and sector and
calls direct_access function of its physical target device.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 28a8f0d3 05-Jun-2016 Mike Christie <mchristi@redhat.com>

block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e6047149 05-Jun-2016 Mike Christie <mchristi@redhat.com>

dm: use bio op accessors

Separate the op from the rq_flag_bits and have dm
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8ae12666 28-Apr-2015 Kent Overstreet <kent.overstreet@gmail.com>

block: kill merge_bvec_fn() completely

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e80d1c80 31-Jul-2015 Vivek Goyal <vgoyal@redhat.com>

dm: do not override error code returned from dm_get_device()

Some of the device mapper targets override the error code returned by
dm_get_device() and return either -EINVAL or -ENXIO. There is nothing
gained by this override. It is better to propagate the returned error
code unchanged to caller.

This work was motivated by hitting an issue where the underlying device
was busy but -EINVAL was being returned. After this change we get
-EBUSY instead and it is easier to figure out the problem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4246a0b6 20-Jul-2015 Christoph Hellwig <hch@lst.de>

block: add a bi_error field to struct bio

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e223e1de 27-Apr-2015 Luis Henriques <luis.henriques@canonical.com>

dm stripe: drop useless exit point from dm_stripe_init()

Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# a3f2af25 10-Oct-2014 Pavitra Kumar <pavitrak@nvidia.com>

dm stripe: fix potential for leak in stripe_ctr error path

Fix a potential struct stripe_c leak that would occur if the
chunk_size exceeded the maximum allowed by dm_set_target_max_io_len
(UINT_MAX). However, in practice there is no possibility of this
occuring given that chunk_size is of type uint32_t. But it is good to
fix this to future-proof in case dm_set_target_max_io_len's
implementation were to change.

Signed-off-by: Pavitra Kumar <pavitrak@nvidia.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 4f024f37 11-Oct-2013 Kent Overstreet <kmo@daterainc.com>

block: Abstract out bvec iterator

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6


# 7fff5e8f 06-Sep-2013 Mike Snitzer <snitzer@redhat.com>

dm stripe: silence a couple sparse warnings

Eliminate the following sparse warnings:
drivers/md/dm-stripe.c:443:12: warning: symbol 'dm_stripe_init' was not declared. Should it be static?
drivers/md/dm-stripe.c:456:6: warning: symbol 'dm_stripe_exit' was not declared. Should it be static?

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# d793e684 10-May-2013 Mike Snitzer <snitzer@redhat.com>

dm stripe: fix regression in stripe_width calculation

Fix a regression in the calculation of the stripe_width in the
dm stripe target which led to incorrect processing of device limits.

The stripe_width is the stripe device length divided by the number of
stripes. The group of commits in the range f14fa69 ("dm stripe: fix
size test") to eb850de ("dm stripe: support for non power of 2
chunksize") interfered with each other (a merging error) and led to the
stripe_width being set incorrectly to the stripe device length divided by
chunk_size * stripe_count.

For example, a stripe device's table with: 0 33553920 striped 3 512 ...
should result in a stripe_width of 11184640 (33553920 / 3), but due to
the bug it was getting set to 21845 (33553920 / (512 * 3)).

The impact of this bug is that device topologies that previously worked
fine with the stripe target are no longer considered valid. In
particular, there is a higher risk of seeing this issue if one of the
stripe devices has a 4K logical block size. Resulting in an error
message like this:
"device-mapper: table: 253:4: len=21845 not aligned to h/w logical block size 4096 of dm-1"

The fix is to swap the order of the divisions and to use a temporary
variable for the second one, so that width retains the intended
value.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.6+
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# f73a1c7d 25-Sep-2012 Kent Overstreet <koverstreet@google.com>

block: Add bio_end_sector()

Just a little convenience macro - main reason to add it now is preparing
for immutable bio vecs, it'll reduce the size of the patch that puts
bi_sector/bi_size/bi_idx into a struct bvec_iter.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Lars Ellenberg <drbd-dev@lists.linbit.com>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Neil Brown <neilb@suse.de>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
CC: Chris Mason <chris.mason@fusionio.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# 55a62eef 01-Mar-2013 Alasdair G Kergon <agk@redhat.com>

dm: rename request variables to bios

Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.

No functional changes.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# fd7c092e 01-Mar-2013 Mikulas Patocka <mpatocka@redhat.com>

dm: fix truncated status strings

Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.

When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.

However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.

If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.

In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
This is incorrect because retrieve_status doesn't check the error
code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.

This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 45e621d4 21-Dec-2012 Mike Snitzer <snitzer@redhat.com>

dm stripe: add WRITE SAME support

Rename stripe_map_discard to stripe_map_range and reuse it for WRITE
SAME bio processing.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 7de3ee57 21-Dec-2012 Mikulas Patocka <mpatocka@redhat.com>

dm: remove map_info

This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# ddbd658f 21-Dec-2012 Mikulas Patocka <mpatocka@redhat.com>

dm: move target request nr to dm_target_io

This patch moves target_request_nr from map_info to dm_target_io and
makes it accessible with dm_bio_get_target_request_nr.

This patch is a preparation for the next patch that removes map_info.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 43829731 20-Aug-2012 Tejun Heo <tj@kernel.org>

workqueue: deprecate flush[_delayed]_work_sync()

flush[_delayed]_work_sync() are now spurious. Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>


# 1f4e0ff0 27-Jul-2012 Alasdair G Kergon <agk@redhat.com>

dm thin: commit before gathering status

Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 33d07c0d 27-Jul-2012 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: optimize chunk_size calculations

dm-stripe is usually used with a chunk size that is a power of two.
Use faster shifts and bit masks in such cases.

stripe_width is already optimized in a similar way.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 8f069b41 27-Jul-2012 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: remove minimum stripe size

There is no technical limitation in device mapper that would prevent the
dm-stripe target from using a stripe size smaller than page size.

This patch removes the limit and makes stripe volumes portable across
architectures with different page size.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# eb850de6 27-Jul-2012 Mike Snitzer <snitzer@redhat.com>

dm stripe: support for non power of 2 chunksize

Support non-power-of-2 chunk sizes with dm striping for proper alignment
of stripe IO on storage that has non-power-of-2 optimal IO sizes (e.g.
RAID6 10+2).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 542f9038 27-Jul-2012 Mike Snitzer <snitzer@redhat.com>

dm: support non power of two target max_io_len

Remove the restriction that limits a target's specified maximum incoming
I/O size to be a power of 2.

Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'.
Change it from sector_t to uint32_t, which is plenty big enough, and
introduce a wrapper function dm_set_target_max_io_len() to set it.
Use sector_div() to process it now that it is not necessarily a power of 2.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 1df05483 27-Jul-2012 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: remove stripes_mask

The structure stripe_c contains a stripes_mask field. This field is
useless because it can be trivially calculated by subtracting one from
stripes. It is used only at one place. This patch removes it.

The patch also changes ffs(stripes) - 1 to __ffs(stripes).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# f14fa693 27-Jul-2012 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: fix size test

dm-stripe is supposed to ensure that all the space allocated to the
stripes is fully used and that all stripes are the same size. This
patch fixes the test. It checks that device length is divisible by the
chunk size and checks that the resulting quotient is divisible by the
number of stripes (which is equivalent to testing if device length is
divisible by chunk_size * stripes).

Previously, the code only tested that the number of sectors in the target
was divisible by each of the chunk size and the number of stripes
separately, which could leave entire stripes unused.

(A setup that genuinely needs some stripes to be shorter than others
can be created by concatenating striped targets.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 1a66a08a 27-Jul-2012 majianpeng <majianpeng@gmail.com>

dm: replace simple_strtoul

Replace obsolete simple_strtoul() with kstrtou8/kstrtouint.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 31998ef1 28-Mar-2012 Mikulas Patocka <mpatocka@redhat.com>

dm: reject trailing characters in sccanf input

Device mapper uses sscanf to convert arguments to numbers. The problem is that
the way we use it ignores additional unmatched characters in the scanned string.

For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
but also it will match number with some garbage appended, like "123abc".

As a result, device mapper accepts garbage after some numbers. For example
the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
will pass without an error.

This patch fixes all sscanf uses in device mapper. It appends "%c" with
a pointer to a dummy character variable to every sscanf statement.

The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
only if string is a null-terminated number (optionally preceded by some
whitespace characters). If there is some character appended after the number,
sscanf matches "%c", writes the character to the dummy variable and returns 2.
We check the return value for 1 and consequently reject numbers with some
garbage appended.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 29915202 24-Mar-2011 Mustafa Mesanovic <mume@linux.vnet.ibm.com>

dm stripe: implement merge method

Implement a merge function in the striped target.

When the striped target's underlying devices provide a merge_bvec_fn
(like all DM devices do via dm_merge_bvec) it is important to call down
to them when building a biovec that doesn't span a stripe boundary.

Without the merge method, a striped DM device stacked on DM devices
causes bios with a single page to be submitted which results
in unnecessary overhead that hurts performance.

This change really helps filesystems (e.g. XFS and now ext4) which take
care to assemble larger bios. By implementing stripe_merge(), DM and the
stripe target no longer undermine the filesystem's work by only allowing
a single page per bio. Buffered IO sees the biggest improvement
(particularly uncached reads, buffered writes to a lesser degree). This
is especially so for more capable "enterprise" storage LUNs.

The performance improvement has been measured to be ~12-35% -- when a
reasonable chunk_size is used (e.g. 64K) in conjunction with a stripe
count that is a power of 2.

In contrast, the performance penalty is ~5-7% for the pathological worst
case stripe configuration (small chunk_size with a stripe count that is
not a power of 2). The reason for this is that stripe_map_sector() is
now called once for every call to dm_merge_bvec(). stripe_map_sector()
will use slower division if stripe count isn't a power of 2.

Signed-off-by: Mustafa Mesanovic <mume@linux.vnet.ibm.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# f521f074 13-Jan-2011 Tejun Heo <tj@kernel.org>

dm stripe: switch from local workqueue to system_wq

kstriped only serves sc->kstriped_ws which runs dm_table_event().
This doesn't need to be executed from an ordered workqueue w/ rescuer.
Drop kstriped and use the system_wq instead. While at it, rename
kstriped_ws to trigger_event so that it's consistent with other dm
modules.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# d87f4c14 03-Sep-2010 Tejun Heo <tj@kernel.org>

dm: implement REQ_FLUSH/FUA support for bio-based dm

This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
now deprecated REQ_HARDBARRIER.

* -EOPNOTSUPP handling logic dropped.

* Preflush is handled as before but postflush is dropped and replaced
with passing down REQ_FUA to member request_queues. This replaces
one array wide cache flush w/ member specific FUA writes.

* __split_and_process_bio() now calls __clone_and_map_flush() directly
for flushes and guarantees all FLUSH bio's going to targets are zero
` length.

* It's now guaranteed that all FLUSH bio's which are passed onto dm
targets are zero length. bio_empty_barrier() tests are replaced
with REQ_FLUSH tests.

* Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.

* Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely
enough to be marked with unlikely().

* Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA
capability.

* Request based dm isn't converted yet. dm_init_request_based_queue()
resets flush support to 0 for now. To avoid disturbing request
based dm code, dm->flush_error is added for bio based dm while
requested based dm continues to use dm->barrier_error.

Lightly tested linear, stripe, raid1, snap and crypt targets. Please
proceed with caution as I'm not familiar with the code base.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: dm-devel@redhat.com
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 7b76ec11 11-Aug-2010 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: support discards

The DM core will submit a discard bio to the stripe target for each
stripe in a striped DM device. The stripe target will determine
stripe-specific portions of the supplied bio to be remapped into
individual (at most 'num_discard_requests' extents). If a given
stripe-specific discard bio doesn't touch a particular stripe the bio
will be dropped.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# c96053b7 11-Aug-2010 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: optimize sector division

Optimize sector division: If the number of stripes is a power of two,
we can do shift and mask instead of division.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 65988525 11-Aug-2010 Mikulas Patocka <mpatocka@redhat.com>

dm stripe: move sector translation to a function

Move sector to stripe translation into a function.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# b441a262 11-Aug-2010 Alasdair G Kergon <agk@redhat.com>

dm: use dm_target_offset macro

Use new dm_target_offset() macro to avoid most references to ti->begin
in dm targets.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 57cba5d3 11-Aug-2010 Mike Snitzer <snitzer@redhat.com>

dm: rename map_info flush_request to target_request_nr

'target_request_nr' is a more generic name that reflects the fact that
it will be used for both flush and discard support.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 7b6d91da 07-Aug-2010 Christoph Hellwig <hch@lst.de>

block: unify flags for struct bio and struct request

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 8215d6ec 05-Mar-2010 Nikanth Karthikesan <knikanth@novell.com>

dm table: remove unused dm_get_device range parameters

Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 781248c1 16-Feb-2010 Nikanth Karthikesan <knikanth@suse.de>

dm stripe: avoid divide by zero with invalid stripe count

If a table containing zero as stripe count is passed into stripe_ctr
the code attempts to divide by zero.

This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count
is zero.

We now get the following error messages:
device-mapper: table: 253:0: striped: Invalid stripe count
device-mapper: ioctl: error adding target to table

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 3c5820c7 11-Sep-2009 Martin K. Petersen <martin.petersen@oracle.com>

block: Optimal I/O limit wrapper

Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper
around it. DM needs this to avoid poking at the queue_limits directly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 1f98a13f 11-Sep-2009 Jens Axboe <jens.axboe@oracle.com>

bio: first step in sanitizing the bio->bi_rw flag testing

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 40bea431 04-Sep-2009 Mike Snitzer <snitzer@redhat.com>

dm stripe: expose correct io hints

Set sensible I/O hints for striped DM devices in the topology
infrastructure added for 2.6.31 for userspace tools to
obtain via sysfs.

Add .io_hints to 'struct target_type' to allow the I/O hints portion
(io_min and io_opt) of the 'struct queue_limits' to be set by each
target and implement this for dm-stripe.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 5dea271b 23-Jul-2009 Mike Snitzer <snitzer@redhat.com>

dm table: pass correct dev area size to device_area_is_valid

Incorrect device area lengths are being passed to device_area_is_valid().

The regression appeared in 2.6.31-rc1 through commit
754c5fc7ebb417b23601a6222a6005cc2e7f2913.

With the dm-stripe target, the size of the target (ti->len) was used
instead of the stripe_width (ti->len/#stripes). An example of a
consequent incorrect error message is:

device-mapper: table: 254:0: sdb too small for target

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# af4874e0 22-Jun-2009 Mike Snitzer <snitzer@redhat.com>

dm target:s introduce iterate devices fn

Add .iterate_devices to 'struct target_type' to allow a function to be
called for all devices in a DM target. Implemented it for all targets
except those in dm-snap.c (origin and snapshot).

(The raid1 version number jumps to 1.12 because we originally reserved
1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors'
instead.)

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: martin.petersen@oracle.com


# 374bf7e7 22-Jun-2009 Mikulas Patocka <mpatocka@redhat.com>

dm: stripe support flush

Flush support for the stripe target.

This sets ti->num_flush_requests to the number of stripes and
remaps individual flush requests to the appropriate stripe devices.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 10d3bd09 05-Jan-2009 Mikulas Patocka <mpatocka@redhat.com>

dm: consolidate target deregistration error handling

Change dm_unregister_target to return void and use BUG() for error
reporting.

dm_unregister_target can only fail because of programming bug in the
target driver. It can't fail because of user's behavior or disk errors.

This patch changes unregister_target to return void and use BUG if
someone tries to unregister non-registered target or unregister target
that is in use.

This patch removes code duplication (testing of error codes in all dm
targets) and reports bugs in just one place, in dm_unregister_target. In
some target drivers, these return codes were ignored, which could lead
to a situation where bugs could be missed.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 6edebdee 13-Nov-2008 Heinz Mauelshagen <heinzm@redhat.com>

dm stripe: fix init failure

Don't proceed if dm_stripe_init() fails to register itself as a dm target.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 586e80e6 21-Oct-2008 Mikulas Patocka <mpatocka@redhat.com>

dm: remove dm header from targets

Change #include "dm.h" to #include <linux/device-mapper.h> in all targets.
Targets should not need direct access to internal DM structures.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# d63a5ce3 21-Oct-2008 Mikulas Patocka <mpatocka@redhat.com>

dm: publish array_too_big

Move array_too_big to include/linux/device-mapper.h because it is
used by targets.

Remove the test from dm-raid1 as the number of mirror legs is limited
such that it can never fail. (Even for stripes it seems rather
unlikely.)

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# f331c029 03-Sep-2008 Tejun Heo <tj@kernel.org>

block: don't depend on consecutive minor space

* Implement disk_devt() and part_devt() and use them to directly
access devt instead of computing it from ->major and ->first_minor.

Note that all references to ->major and ->first_minor outside of
block layer is used to determine devt of the disk (the part0) and as
->major and ->first_minor will continue to represent devt for the
disk, converting these users aren't strictly necessary. However,
convert them for consistency.

* Implement disk_max_parts() to avoid directly deferencing
genhd->minors.

* Update bdget_disk() such that it doesn't assume consecutive minor
space.

* Move devt computation from register_disk() to add_disk() and make it
the only one (all other usages use the initially determined value).

These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 4f7f5c67 07-Feb-2008 Brian Wood <brian.j.wood@intel.com>

dm: stripe enhanced status return

This patch adds additional information to the status line. It is added at the
end of the returned text so it will not interfere with existing
implementations using this data. The addition of this information will allow
for a common return interface to match that returned with the dm-raid1.c
status line (with Jonathan Brassow's patches).

Here is a sample of what is returned with a mirror "status" call:
isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core

Here's what's returned with this patch for a stripe "status" call:
isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA

Signed-off-by: Brian Wood <brian.j.wood@intel.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# a25eb944 07-Feb-2008 Brian Wood <brian.j.wood@intel.com>

dm: stripe trigger event on failure

This patch adds the stripe_end_io function to process errors that might
occur after an IO operation. As part of this there are a number of
enhancements made to record and trigger events:

- New atomic variable in struct stripe to record the number of
errors each stripe volume device has experienced (could be used
later with uevents to report back directly to userspace)

- New workqueue/work struct setup to process the trigger_event function

- New end_io function. It is here that testing for BIO error conditions
take place. It determines the exact stripe that cause the error,
records this in the new atomic variable, and calls the queue_work() function

- New trigger_event function to process failure events. This
calls dm_table_event()

Signed-off-by: Brian Wood <brian.j.wood@intel.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# 6f3c3f0a 19-Oct-2007 vignesh babu <vignesh.babu@wipro.com>

dm: use is_power_of_2

Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)

Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>


# d2a7ad29 08-Dec-2006 Kiyoshi Ueda <k-ueda@ct.jp.nec.com>

[PATCH] dm: map and endio symbolic return codes

Update existing targets to use the new symbols for return values from target
map and end_io functions.

There is no effect on behaviour.

Test results:
Done build test without errors.

Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 72d94861 26-Jun-2006 Alasdair G Kergon <agk@redhat.com>

[PATCH] dm: improve error message consistency

Tidy device-mapper error messages to include context information
automatically.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# a22c96c7 27-Mar-2006 Kevin Corry <kevcorry@us.ibm.com>

[PATCH] dm: remove unnecessary typecast

Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4ee218cd 27-Mar-2006 Andrew Morton <akpm@osdl.org>

[PATCH] dm: remove SECTOR_FORMAT

We don't know what type sector_t has. Sometimes it's unsigned long, sometimes
it's unsigned long long. For example on ppc64 it's unsigned long with
CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.

The way to handle all of this is to always use unsigned long long and to
always typecast the sector_t when printing it.

Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 8ba32fde 17-Mar-2006 Kevin Corry <kevcorry@us.ibm.com>

[PATCH] dm stripe: Fix bounds

The dm-stripe target currently does not enforce that the size of a stripe
device be a multiple of the chunk-size. Under certain conditions, this can
lead to I/O requests going off the end of an underlying device. This
test-case shows one example.

echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
dmsetup create stripe0
dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k

This will produce the output:
dd: writing '/dev/mapper/stripe0': Input/output error
97+0 records in
96+0 records out

And in the kernel log will be:
attempt to access beyond end of device
dm-0: rw=0, want=104, limit=100

The patch will check that the table size is a multiple of the stripe
chunk-size when the table is created, which will prevent the above striped
device from being created.

This should not affect tools like LVM or EVMS, since in all the cases I can
think of, striped devices are always created with the sizes being a
multiple of the chunk-size.

The size of a stripe device must be a multiple of its chunk-size.

(akpm: that typecast is quite gratuitous)

Signed-off-by: Kevin Corry <kevcorry@us.ibm.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!