#
f1445032 |
|
18-Sep-2023 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: shortcut the calls to linear_map and stripe_map Shortcut the calls to linear_map and stripe_map, so that they don't suffer the overhead of retpolines used for indirect calls. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
694b3b9d |
|
15-Sep-2023 |
Kees Cook <keescook@chromium.org> |
dm: Annotate struct stripe_c with __counted_by Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct stripe_c. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: dm-devel@redhat.com Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230915200352.never.118-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
|
#
26cb62a2 |
|
16-Mar-2023 |
Yu Zhe <yuzhe@nfschina.com> |
dm: remove unnecessary (void*) conversions Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
a7e8f7fb |
|
19-Apr-2022 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
dm: update targets using system workqueues to use a local workqueue Flushing system-wide workqueues is dangerous and will be forbidden. Use a local workqueue in dm-mpath.c, dm-raid1.c, and dm-stripe.c. Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
1c3fe2fa |
|
07-Feb-2023 |
Heinz Mauelshagen <heinzm@redhat.com> |
dm: avoid useless 'else' after 'break' or return' Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
2e84fecf |
|
03-Feb-2023 |
Heinz Mauelshagen <heinzm@redhat.com> |
dm: avoid split of quoted strings where possible Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
2d0f25cb |
|
02-Feb-2023 |
Heinz Mauelshagen <heinzm@redhat.com> |
dm: remove unnecessary braces from single statement blocks Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
86a3238c |
|
25-Jan-2023 |
Heinz Mauelshagen <heinzm@redhat.com> |
dm: change "unsigned" to "unsigned int" Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
3bd94003 |
|
25-Jan-2023 |
Heinz Mauelshagen <heinzm@redhat.com> |
dm: add missing SPDX-License-Indentifiers 'GPL-2.0-only' is used instead of 'GPL-2.0' because SPDX has deprecated its use. Suggested-by: John Wiele <jwiele@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
#
047218ec |
|
22-Apr-2022 |
Jane Chu <jane.chu@oracle.com> |
dax: add .recovery_write dax_operation Introduce dax_recovery_write() operation. The function is used to recover a dax range that contains poison. Typical use case is when a user process receives a SIGBUS with si_code BUS_MCEERR_AR indicating poison(s) in a dax range, in response, the user process issues a pwrite() to the page-aligned dax range, thus clears the poison and puts valid data in the range. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e511c4a3 |
|
13-May-2022 |
Jane Chu <jane.chu@oracle.com> |
dax: introduce DAX_RECOVERY_WRITE dax access mode Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write. Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
a773187e |
|
09-Feb-2022 |
Christoph Hellwig <hch@lst.de> |
scsi: dm: Remove WRITE_SAME support There are no more end-users of REQ_OP_WRITE_SAME left, so we can start deleting it. Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
#
7ac5360c |
|
15-Dec-2021 |
Christoph Hellwig <hch@lst.de> |
dax: remove the copy_from_iter and copy_to_iter methods These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
2a68553e |
|
29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dm-stripe: add a stripe_dax_pgoff helper Add a helper to perform the entire remapping for DAX accesses. This helper open codes bdev_dax_pgoff given that the alignment checks have already been done by the submitting file system and don't need to be repeated. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211129102203.2243509-12-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
5d2a228b |
|
29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dm: make the DAX support depend on CONFIG_FS_DAX The device mapper DAX support is all hanging off a block device and thus can't be used with device dax. Make it depend on CONFIG_FS_DAX instead of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to be built under CONFIG_FS_DAX now. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8ec45662 |
|
12-Jul-2021 |
Tushar Sugandhi <tusharsu@linux.microsoft.com> |
dm: update target status functions to support IMA measurement For device mapper targets to take advantage of IMA's measurement capabilities, the status functions for the individual targets need to be updated to handle the status_type_t case for value STATUSTYPE_IMA. Update status functions for the following target types, to log their respective attributes to be measured using IMA. 01. cache 02. crypt 03. integrity 04. linear 05. mirror 06. multipath 07. raid 08. snapshot 09. striped 10. verity For rest of the targets, handle the STATUSTYPE_IMA case by setting the measurement buffer to NULL. For IMA to measure the data on a given system, the IMA policy on the system needs to be updated to have the following line, and the system needs to be restarted for the measurements to take effect. /etc/ima/ima-policy measure func=CRITICAL_DATA label=device-mapper template=ima-buf The measurements will be reflected in the IMA logs, which are located at: /sys/kernel/security/integrity/ima/ascii_runtime_measurements /sys/kernel/security/integrity/ima/binary_runtime_measurements These IMA logs can later be consumed by various attestation clients running on the system, and send them to external services for attesting the system. The DM target data measured by IMA subsystem can alternatively be queried from userspace by setting DM_IMA_MEASUREMENT_FLAG with DM_TABLE_STATUS_CMD. Signed-off-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
410fe220 |
|
12-Nov-2020 |
Jeffle Xu <jefflexu@linux.alibaba.com> |
dm: add support for REQ_NOWAIT to various targets commit 021a24460dc2 ("block: add QUEUE_FLAG_NOWAIT") added a new queue flag QUEUE_FLAG_NOWAIT to advertise if the bdev supports handling of REQ_NOWAIT or not. DM core supports stacking QUEUE_FLAG_NOWAIT since commit 6abc49468eea ("dm: add support for REQ_NOWAIT and enable it for linear target"), in which only dm-linear enabled it. Update others DM targets, which just do simple remapping, to enable support for REQ_NOWAIT. Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
b18ae8dd |
|
07-May-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
dm: replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
cdf6cdcd |
|
28-Feb-2020 |
Vivek Goyal <vgoyal@redhat.com> |
dm,dax: Add dax zero_page_range operation This patch adds support for dax zero_page_range operation to dm targets. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8adeac3b |
|
02-Oct-2019 |
Gustavo A. R. Silva <gustavo@embeddedor.com> |
dm stripe: use struct_size() in kmalloc() One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct stripe_c { ... struct stripe stripe[0]; }; In this case alloc_context() and dm_array_too_big() are removed and replaced by the direct use of the struct_size() helper in kmalloc(). Notice that open-coded form is prone to type mistakes. This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
b3a9a0c3 |
|
02-May-2018 |
Dan Williams <dan.j.williams@intel.com> |
dax: Introduce a ->copy_to_iter dax operation Similar to the ->copy_from_iter() operation, a platform may want to deploy an architecture or device specific routine for handling reads from a dax_device like /dev/pmemX. On x86 this routine will point to a machine check safe version of copy_to_iter(). For now, add the plumbing to device-mapper and the dax core. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
00716545 |
|
13-Mar-2018 |
Denis Semakin <d.semakin@omprussia.ru> |
dm: add support for secure erase forwarding Set QUEUE_FLAG_SECERASE in DM device's queue_flags if a DM table's data devices support secure erase. Also, add support for secure erase to both the linear and striped targets. Signed-off-by: Denis Semakin <d.semakin@omprussia.ru> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
706dd22f |
|
08-Mar-2018 |
Tycho Andersen <tycho@tycho.pizza> |
dm stripe: get rid of a Variable Length Array (VLA) Ideally, we'd like to get rid of all VLAs in the kernel and add -Wvla to the build args: https://lkml.org/lkml/2018/3/7/621 This one is a simple case, since we don't actually need the VLA at all: we can just iterate over the stripes twice, once to emit their names, and the second time to emit status (i.e. trade memory for time). Since the number of stripes is probably low, this is hopefully not that expensive. Signed-off-by: Tycho Andersen <tycho@tycho.ws> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
976431b0 |
|
29-Mar-2018 |
Dan Williams <dan.j.williams@intel.com> |
dax, dm: allow device-mapper to operate without dax support Change device-mapper's DAX dependency to require the presence of at least one DAX_DRIVER. This allows device-mapper to be built without bringing the DAX core along which is especially wasteful when there are no DAX drivers, like BLK_DEV_PMEM, configured. Cc: Alasdair Kergon <agk@redhat.com> Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c3ca015f |
|
31-Aug-2017 |
Mikulas Patocka <mpatocka@redhat.com> |
dax: remove the pmem_dax_ops->flush abstraction Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is buggy. A DM device may be composed of multiple underlying devices and all of them need to be flushed. That commit just routes the flush request to the first device and ignores the other devices. It could be fixed by adding more complex logic to the device mapper. But there is only one implementation of the method pmem_dax_ops->flush - that is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we don't need the pmem_dax_ops->flush abstraction at all, we can call arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush can't ever reach anything different from arch_wb_cache_pmem(). It should be also pointed out that for some uses of persistent memory it is needed to flush only a very small amount of data (such as 1 cacheline), and it would be overkill if we go through that device mapper machinery for a single flushed cache line. Fix this by removing the pmem_dax_ops->flush abstraction and call arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device mapper code that forwards the flushes. Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support") Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
74d46992 |
|
23-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
block: replace bi_bdev with a gendisk pointer and partitions index This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
abebfbe2 |
|
29-May-2017 |
Dan Williams <dan.j.williams@intel.com> |
dm: add ->flush() dax operation support Allow device-mapper to route flush operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying flush implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7e026c8c |
|
29-May-2017 |
Dan Williams <dan.j.williams@intel.com> |
dm: add ->copy_from_iter() dax operation support Allow device-mapper to route copy_from_iter operations to the per-target implementation. In order for the device stacking to work we need a dax_dev and a pgoff relative to that device. This gives each layer of the stack the information it needs to look up the operation pointer for the next level. This conceptually allows for an array of mixed device drivers with varying copy_from_iter implementations. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
4e4cbee9 |
|
03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
1be56909 |
|
03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
dm: change ->end_io calling convention Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
9966afaf |
|
03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
dm: fix REQ_RAHEAD handling A few (but not all) dm targets use a special EWOULDBLOCK error code for failing REQ_RAHEAD requests that fail due to a lack of available resources. But no one else knows about this magic code, and lower level drivers also don't generate it when failing read-ahead requests for similar reasons. So remove this special casing and ignore all additional error handling for REQ_RAHEAD - if this was a real underlying error we'd get a normal read once the real read comes in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
817bf402 |
|
12-Apr-2017 |
Dan Williams <dan.j.williams@intel.com> |
dm: teach dm-targets to use a dax_device + dax_operations Arrange for dm to lookup the dax services available from member devices. Update the dax-capable targets, linear and stripe, to route dax operations to the underlying device. Changes the target-internal ->direct_access() method to more closely align with the dax_operations ->direct_access() calling convention. Cc: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e2460f2a |
|
18-Apr-2017 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: mark targets that pass integrity data A dm-crypt on dm-integrity device incorrectly advertises an integrity profile on the DM crypt device. It can be seen in the files "/sys/block/dm-*/integrity/*" that both dm-integrity and dm-crypt target advertise the integrity profile. That is incorrect, only the dm-integrity target should advertise the integrity profile. A general problem in DM is that if we have a DM device that depends on another device with an integrity profile, the upper device will always advertise the integrity profile, even when the target driver doesn't support handling integrity data. Most targets don't support integrity data, so we provide a whitelist of targets that support it (linear, delay and striped). The targets that support passing integrity data to the lower device are marked with the flag DM_TARGET_PASSES_INTEGRITY. The DM core will now advertise integrity data on a DM device only if all the targets support the integrity data. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
ac62d620 |
|
05-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
dm: support REQ_OP_WRITE_ZEROES Copy & paste from the REQ_OP_WRITE_SAME code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
1eff9d32 |
|
05-Aug-2016 |
Jens Axboe <axboe@fb.com> |
block: rename bio bi_rw to bi_opf Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
beec25b4 |
|
24-Jun-2016 |
Toshi Kani <toshi.kani@hpe.com> |
dm stripe: add DAX support Change dm-stripe to implement direct_access function, stripe_direct_access(), which maps bdev and sector and calls direct_access function of its physical target device. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
28a8f0d3 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
e6047149 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
dm: use bio op accessors Separate the op from the rq_flag_bits and have dm set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
8ae12666 |
|
28-Apr-2015 |
Kent Overstreet <kent.overstreet@gmail.com> |
block: kill merge_bvec_fn() completely As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
e80d1c80 |
|
31-Jul-2015 |
Vivek Goyal <vgoyal@redhat.com> |
dm: do not override error code returned from dm_get_device() Some of the device mapper targets override the error code returned by dm_get_device() and return either -EINVAL or -ENXIO. There is nothing gained by this override. It is better to propagate the returned error code unchanged to caller. This work was motivated by hitting an issue where the underlying device was busy but -EINVAL was being returned. After this change we get -EBUSY instead and it is easier to figure out the problem. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
4246a0b6 |
|
20-Jul-2015 |
Christoph Hellwig <hch@lst.de> |
block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
e223e1de |
|
27-Apr-2015 |
Luis Henriques <luis.henriques@canonical.com> |
dm stripe: drop useless exit point from dm_stripe_init() Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
a3f2af25 |
|
10-Oct-2014 |
Pavitra Kumar <pavitrak@nvidia.com> |
dm stripe: fix potential for leak in stripe_ctr error path Fix a potential struct stripe_c leak that would occur if the chunk_size exceeded the maximum allowed by dm_set_target_max_io_len (UINT_MAX). However, in practice there is no possibility of this occuring given that chunk_size is of type uint32_t. But it is good to fix this to future-proof in case dm_set_target_max_io_len's implementation were to change. Signed-off-by: Pavitra Kumar <pavitrak@nvidia.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
#
4f024f37 |
|
11-Oct-2013 |
Kent Overstreet <kmo@daterainc.com> |
block: Abstract out bvec iterator Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
|
#
7fff5e8f |
|
06-Sep-2013 |
Mike Snitzer <snitzer@redhat.com> |
dm stripe: silence a couple sparse warnings Eliminate the following sparse warnings: drivers/md/dm-stripe.c:443:12: warning: symbol 'dm_stripe_init' was not declared. Should it be static? drivers/md/dm-stripe.c:456:6: warning: symbol 'dm_stripe_exit' was not declared. Should it be static? Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
d793e684 |
|
10-May-2013 |
Mike Snitzer <snitzer@redhat.com> |
dm stripe: fix regression in stripe_width calculation Fix a regression in the calculation of the stripe_width in the dm stripe target which led to incorrect processing of device limits. The stripe_width is the stripe device length divided by the number of stripes. The group of commits in the range f14fa69 ("dm stripe: fix size test") to eb850de ("dm stripe: support for non power of 2 chunksize") interfered with each other (a merging error) and led to the stripe_width being set incorrectly to the stripe device length divided by chunk_size * stripe_count. For example, a stripe device's table with: 0 33553920 striped 3 512 ... should result in a stripe_width of 11184640 (33553920 / 3), but due to the bug it was getting set to 21845 (33553920 / (512 * 3)). The impact of this bug is that device topologies that previously worked fine with the stripe target are no longer considered valid. In particular, there is a higher risk of seeing this issue if one of the stripe devices has a 4K logical block size. Resulting in an error message like this: "device-mapper: table: 253:4: len=21845 not aligned to h/w logical block size 4096 of dm-1" The fix is to swap the order of the divisions and to use a temporary variable for the second one, so that width retains the intended value. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.6+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
f73a1c7d |
|
25-Sep-2012 |
Kent Overstreet <koverstreet@google.com> |
block: Add bio_end_sector() Just a little convenience macro - main reason to add it now is preparing for immutable bio vecs, it'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Jiri Kosina <jkosina@suse.cz> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org CC: Chris Mason <chris.mason@fusionio.com> CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
55a62eef |
|
01-Mar-2013 |
Alasdair G Kergon <agk@redhat.com> |
dm: rename request variables to bios Use 'bio' in the name of variables and functions that deal with bios rather than 'request' to avoid confusion with the normal block layer use of 'request'. No functional changes. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
fd7c092e |
|
01-Mar-2013 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: fix truncated status strings Avoid returning a truncated table or status string instead of setting the DM_BUFFER_FULL_FLAG when the last target of a table fills the buffer. When processing a table or status request, the function retrieve_status calls ti->type->status. If ti->type->status returns non-zero, retrieve_status assumes that the buffer overflowed and sets DM_BUFFER_FULL_FLAG. However, targets don't return non-zero values from their status method on overflow. Most targets returns always zero. If a buffer overflow happens in a target that is not the last in the table, it gets noticed during the next iteration of the loop in retrieve_status; but if a buffer overflow happens in the last target, it goes unnoticed and erroneously truncated data is returned. In the current code, the targets behave in the following way: * dm-crypt returns -ENOMEM if there is not enough space to store the key, but it returns 0 on all other overflows. * dm-thin returns errors from the status method if a disk error happened. This is incorrect because retrieve_status doesn't check the error code, it assumes that all non-zero values mean buffer overflow. * all the other targets always return 0. This patch changes the ti->type->status function to return void (because most targets don't use the return code). Overflow is detected in retrieve_status: if the status method fills up the remaining space completely, it is assumed that buffer overflow happened. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
45e621d4 |
|
21-Dec-2012 |
Mike Snitzer <snitzer@redhat.com> |
dm stripe: add WRITE SAME support Rename stripe_map_discard to stripe_map_range and reuse it for WRITE SAME bio processing. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
7de3ee57 |
|
21-Dec-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: remove map_info This patch removes map_info from bio-based device mapper targets. map_info is still used for request-based targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
ddbd658f |
|
21-Dec-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: move target request nr to dm_target_io This patch moves target_request_nr from map_info to dm_target_io and makes it accessible with dm_bio_get_target_request_nr. This patch is a preparation for the next patch that removes map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
43829731 |
|
20-Aug-2012 |
Tejun Heo <tj@kernel.org> |
workqueue: deprecate flush[_delayed]_work_sync() flush[_delayed]_work_sync() are now spurious. Mark them deprecated and convert all users to flush[_delayed]_work(). If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant and the regular flushes guarantee that the work item is not pending or running on any CPU on return, so there's no reason to use the sync flushes at all and they're going away. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mattia Dongili <malattia@linux.it> Cc: Kent Yoder <key@linux.vnet.ibm.com> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Bryan Wu <bryan.wu@canonical.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-wireless@vger.kernel.org Cc: Anton Vorontsov <cbou@mail.ru> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Avi Kivity <avi@redhat.com>
|
#
1f4e0ff0 |
|
27-Jul-2012 |
Alasdair G Kergon <agk@redhat.com> |
dm thin: commit before gathering status Commit outstanding metadata before returning the status for a dm thin pool so that the numbers reported are as up-to-date as possible. The commit is not performed if the device is suspended or if the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target through a new 'status_flags' parameter in the target's dm_status_fn. The userspace dmsetup tool will support the --noflush flag with the 'dmsetup status' and 'dmsetup wait' commands from version 1.02.76 onwards. Tested-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
33d07c0d |
|
27-Jul-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: optimize chunk_size calculations dm-stripe is usually used with a chunk size that is a power of two. Use faster shifts and bit masks in such cases. stripe_width is already optimized in a similar way. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
8f069b41 |
|
27-Jul-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: remove minimum stripe size There is no technical limitation in device mapper that would prevent the dm-stripe target from using a stripe size smaller than page size. This patch removes the limit and makes stripe volumes portable across architectures with different page size. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
eb850de6 |
|
27-Jul-2012 |
Mike Snitzer <snitzer@redhat.com> |
dm stripe: support for non power of 2 chunksize Support non-power-of-2 chunk sizes with dm striping for proper alignment of stripe IO on storage that has non-power-of-2 optimal IO sizes (e.g. RAID6 10+2). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
542f9038 |
|
27-Jul-2012 |
Mike Snitzer <snitzer@redhat.com> |
dm: support non power of two target max_io_len Remove the restriction that limits a target's specified maximum incoming I/O size to be a power of 2. Rename this setting from 'split_io' to the less-ambiguous 'max_io_len'. Change it from sector_t to uint32_t, which is plenty big enough, and introduce a wrapper function dm_set_target_max_io_len() to set it. Use sector_div() to process it now that it is not necessarily a power of 2. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
1df05483 |
|
27-Jul-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: remove stripes_mask The structure stripe_c contains a stripes_mask field. This field is useless because it can be trivially calculated by subtracting one from stripes. It is used only at one place. This patch removes it. The patch also changes ffs(stripes) - 1 to __ffs(stripes). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
f14fa693 |
|
27-Jul-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: fix size test dm-stripe is supposed to ensure that all the space allocated to the stripes is fully used and that all stripes are the same size. This patch fixes the test. It checks that device length is divisible by the chunk size and checks that the resulting quotient is divisible by the number of stripes (which is equivalent to testing if device length is divisible by chunk_size * stripes). Previously, the code only tested that the number of sectors in the target was divisible by each of the chunk size and the number of stripes separately, which could leave entire stripes unused. (A setup that genuinely needs some stripes to be shorter than others can be created by concatenating striped targets.) Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
1a66a08a |
|
27-Jul-2012 |
majianpeng <majianpeng@gmail.com> |
dm: replace simple_strtoul Replace obsolete simple_strtoul() with kstrtou8/kstrtouint. Signed-off-by: majianpeng <majianpeng@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
31998ef1 |
|
28-Mar-2012 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: reject trailing characters in sccanf input Device mapper uses sscanf to convert arguments to numbers. The problem is that the way we use it ignores additional unmatched characters in the scanned string. For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number, but also it will match number with some garbage appended, like "123abc". As a result, device mapper accepts garbage after some numbers. For example the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"' will pass without an error. This patch fixes all sscanf uses in device mapper. It appends "%c" with a pointer to a dummy character variable to every sscanf statement. The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds only if string is a null-terminated number (optionally preceded by some whitespace characters). If there is some character appended after the number, sscanf matches "%c", writes the character to the dummy variable and returns 2. We check the return value for 1 and consequently reject numbers with some garbage appended. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
29915202 |
|
24-Mar-2011 |
Mustafa Mesanovic <mume@linux.vnet.ibm.com> |
dm stripe: implement merge method Implement a merge function in the striped target. When the striped target's underlying devices provide a merge_bvec_fn (like all DM devices do via dm_merge_bvec) it is important to call down to them when building a biovec that doesn't span a stripe boundary. Without the merge method, a striped DM device stacked on DM devices causes bios with a single page to be submitted which results in unnecessary overhead that hurts performance. This change really helps filesystems (e.g. XFS and now ext4) which take care to assemble larger bios. By implementing stripe_merge(), DM and the stripe target no longer undermine the filesystem's work by only allowing a single page per bio. Buffered IO sees the biggest improvement (particularly uncached reads, buffered writes to a lesser degree). This is especially so for more capable "enterprise" storage LUNs. The performance improvement has been measured to be ~12-35% -- when a reasonable chunk_size is used (e.g. 64K) in conjunction with a stripe count that is a power of 2. In contrast, the performance penalty is ~5-7% for the pathological worst case stripe configuration (small chunk_size with a stripe count that is not a power of 2). The reason for this is that stripe_map_sector() is now called once for every call to dm_merge_bvec(). stripe_map_sector() will use slower division if stripe count isn't a power of 2. Signed-off-by: Mustafa Mesanovic <mume@linux.vnet.ibm.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
f521f074 |
|
13-Jan-2011 |
Tejun Heo <tj@kernel.org> |
dm stripe: switch from local workqueue to system_wq kstriped only serves sc->kstriped_ws which runs dm_table_event(). This doesn't need to be executed from an ordered workqueue w/ rescuer. Drop kstriped and use the system_wq instead. While at it, rename kstriped_ws to trigger_event so that it's consistent with other dm modules. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
d87f4c14 |
|
03-Sep-2010 |
Tejun Heo <tj@kernel.org> |
dm: implement REQ_FLUSH/FUA support for bio-based dm This patch converts bio-based dm to support REQ_FLUSH/FUA instead of now deprecated REQ_HARDBARRIER. * -EOPNOTSUPP handling logic dropped. * Preflush is handled as before but postflush is dropped and replaced with passing down REQ_FUA to member request_queues. This replaces one array wide cache flush w/ member specific FUA writes. * __split_and_process_bio() now calls __clone_and_map_flush() directly for flushes and guarantees all FLUSH bio's going to targets are zero ` length. * It's now guaranteed that all FLUSH bio's which are passed onto dm targets are zero length. bio_empty_barrier() tests are replaced with REQ_FLUSH tests. * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes. * Dropped unlikely() around REQ_FLUSH tests. Flushes are not unlikely enough to be marked with unlikely(). * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue doesn't support cache flushing. Advertise REQ_FLUSH | REQ_FUA capability. * Request based dm isn't converted yet. dm_init_request_based_queue() resets flush support to 0 for now. To avoid disturbing request based dm code, dm->flush_error is added for bio based dm while requested based dm continues to use dm->barrier_error. Lightly tested linear, stripe, raid1, snap and crypt targets. Please proceed with caution as I'm not familiar with the code base. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
7b76ec11 |
|
11-Aug-2010 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: support discards The DM core will submit a discard bio to the stripe target for each stripe in a striped DM device. The stripe target will determine stripe-specific portions of the supplied bio to be remapped into individual (at most 'num_discard_requests' extents). If a given stripe-specific discard bio doesn't touch a particular stripe the bio will be dropped. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
c96053b7 |
|
11-Aug-2010 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: optimize sector division Optimize sector division: If the number of stripes is a power of two, we can do shift and mask instead of division. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
65988525 |
|
11-Aug-2010 |
Mikulas Patocka <mpatocka@redhat.com> |
dm stripe: move sector translation to a function Move sector to stripe translation into a function. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
b441a262 |
|
11-Aug-2010 |
Alasdair G Kergon <agk@redhat.com> |
dm: use dm_target_offset macro Use new dm_target_offset() macro to avoid most references to ti->begin in dm targets. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
57cba5d3 |
|
11-Aug-2010 |
Mike Snitzer <snitzer@redhat.com> |
dm: rename map_info flush_request to target_request_nr 'target_request_nr' is a more generic name that reflects the fact that it will be used for both flush and discard support. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
7b6d91da |
|
07-Aug-2010 |
Christoph Hellwig <hch@lst.de> |
block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
8215d6ec |
|
05-Mar-2010 |
Nikanth Karthikesan <knikanth@novell.com> |
dm table: remove unused dm_get_device range parameters Remove unused parameters(start and len) of dm_get_device() and fix the callers. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
781248c1 |
|
16-Feb-2010 |
Nikanth Karthikesan <knikanth@suse.de> |
dm stripe: avoid divide by zero with invalid stripe count If a table containing zero as stripe count is passed into stripe_ctr the code attempts to divide by zero. This patch changes DM_TABLE_LOAD to return -EINVAL if the stripe count is zero. We now get the following error messages: device-mapper: table: 253:0: striped: Invalid stripe count device-mapper: ioctl: error adding target to table Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Cc: stable@kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
3c5820c7 |
|
11-Sep-2009 |
Martin K. Petersen <martin.petersen@oracle.com> |
block: Optimal I/O limit wrapper Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper around it. DM needs this to avoid poking at the queue_limits directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
1f98a13f |
|
11-Sep-2009 |
Jens Axboe <jens.axboe@oracle.com> |
bio: first step in sanitizing the bio->bi_rw flag testing Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
40bea431 |
|
04-Sep-2009 |
Mike Snitzer <snitzer@redhat.com> |
dm stripe: expose correct io hints Set sensible I/O hints for striped DM devices in the topology infrastructure added for 2.6.31 for userspace tools to obtain via sysfs. Add .io_hints to 'struct target_type' to allow the I/O hints portion (io_min and io_opt) of the 'struct queue_limits' to be set by each target and implement this for dm-stripe. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
5dea271b |
|
23-Jul-2009 |
Mike Snitzer <snitzer@redhat.com> |
dm table: pass correct dev area size to device_area_is_valid Incorrect device area lengths are being passed to device_area_is_valid(). The regression appeared in 2.6.31-rc1 through commit 754c5fc7ebb417b23601a6222a6005cc2e7f2913. With the dm-stripe target, the size of the target (ti->len) was used instead of the stripe_width (ti->len/#stripes). An example of a consequent incorrect error message is: device-mapper: table: 254:0: sdb too small for target Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
af4874e0 |
|
22-Jun-2009 |
Mike Snitzer <snitzer@redhat.com> |
dm target:s introduce iterate devices fn Add .iterate_devices to 'struct target_type' to allow a function to be called for all devices in a DM target. Implemented it for all targets except those in dm-snap.c (origin and snapshot). (The raid1 version number jumps to 1.12 because we originally reserved 1.1 to 1.11 for 'block_on_error' but ended up using 'handle_errors' instead.) Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: martin.petersen@oracle.com
|
#
374bf7e7 |
|
22-Jun-2009 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: stripe support flush Flush support for the stripe target. This sets ti->num_flush_requests to the number of stripes and remaps individual flush requests to the appropriate stripe devices. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
10d3bd09 |
|
05-Jan-2009 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: consolidate target deregistration error handling Change dm_unregister_target to return void and use BUG() for error reporting. dm_unregister_target can only fail because of programming bug in the target driver. It can't fail because of user's behavior or disk errors. This patch changes unregister_target to return void and use BUG if someone tries to unregister non-registered target or unregister target that is in use. This patch removes code duplication (testing of error codes in all dm targets) and reports bugs in just one place, in dm_unregister_target. In some target drivers, these return codes were ignored, which could lead to a situation where bugs could be missed. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
6edebdee |
|
13-Nov-2008 |
Heinz Mauelshagen <heinzm@redhat.com> |
dm stripe: fix init failure Don't proceed if dm_stripe_init() fails to register itself as a dm target. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
586e80e6 |
|
21-Oct-2008 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: remove dm header from targets Change #include "dm.h" to #include <linux/device-mapper.h> in all targets. Targets should not need direct access to internal DM structures. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
d63a5ce3 |
|
21-Oct-2008 |
Mikulas Patocka <mpatocka@redhat.com> |
dm: publish array_too_big Move array_too_big to include/linux/device-mapper.h because it is used by targets. Remove the test from dm-raid1 as the number of mirror legs is limited such that it can never fail. (Even for stripes it seems rather unlikely.) Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
f331c029 |
|
03-Sep-2008 |
Tejun Heo <tj@kernel.org> |
block: don't depend on consecutive minor space * Implement disk_devt() and part_devt() and use them to directly access devt instead of computing it from ->major and ->first_minor. Note that all references to ->major and ->first_minor outside of block layer is used to determine devt of the disk (the part0) and as ->major and ->first_minor will continue to represent devt for the disk, converting these users aren't strictly necessary. However, convert them for consistency. * Implement disk_max_parts() to avoid directly deferencing genhd->minors. * Update bdget_disk() such that it doesn't assume consecutive minor space. * Move devt computation from register_disk() to add_disk() and make it the only one (all other usages use the initially determined value). These changes clean up the code and will help disk->part dereference fix and extended block device numbers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
|
#
4f7f5c67 |
|
07-Feb-2008 |
Brian Wood <brian.j.wood@intel.com> |
dm: stripe enhanced status return This patch adds additional information to the status line. It is added at the end of the returned text so it will not interfere with existing implementations using this data. The addition of this information will allow for a common return interface to match that returned with the dm-raid1.c status line (with Jonathan Brassow's patches). Here is a sample of what is returned with a mirror "status" call: isw_eeaaabgfg_mirror: 0 488390920 mirror 2 8:16 8:32 3727/3727 1 AA 1 core Here's what's returned with this patch for a stripe "status" call: isw_dheeijjdej_stripe: 0 976783872 striped 2 8:16 8:32 1 AA Signed-off-by: Brian Wood <brian.j.wood@intel.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
a25eb944 |
|
07-Feb-2008 |
Brian Wood <brian.j.wood@intel.com> |
dm: stripe trigger event on failure This patch adds the stripe_end_io function to process errors that might occur after an IO operation. As part of this there are a number of enhancements made to record and trigger events: - New atomic variable in struct stripe to record the number of errors each stripe volume device has experienced (could be used later with uevents to report back directly to userspace) - New workqueue/work struct setup to process the trigger_event function - New end_io function. It is here that testing for BIO error conditions take place. It determines the exact stripe that cause the error, records this in the new atomic variable, and calls the queue_work() function - New trigger_event function to process failure events. This calls dm_table_event() Signed-off-by: Brian Wood <brian.j.wood@intel.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
6f3c3f0a |
|
19-Oct-2007 |
vignesh babu <vignesh.babu@wipro.com> |
dm: use is_power_of_2 Replacing n & (n - 1) for power of 2 check by is_power_of_2(n) Signed-off-by: vignesh babu <vignesh.babu@wipro.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
|
#
d2a7ad29 |
|
08-Dec-2006 |
Kiyoshi Ueda <k-ueda@ct.jp.nec.com> |
[PATCH] dm: map and endio symbolic return codes Update existing targets to use the new symbols for return values from target map and end_io functions. There is no effect on behaviour. Test results: Done build test without errors. Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: dm-devel@redhat.com Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
72d94861 |
|
26-Jun-2006 |
Alasdair G Kergon <agk@redhat.com> |
[PATCH] dm: improve error message consistency Tidy device-mapper error messages to include context information automatically. Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
a22c96c7 |
|
27-Mar-2006 |
Kevin Corry <kevcorry@us.ibm.com> |
[PATCH] dm: remove unnecessary typecast Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
4ee218cd |
|
27-Mar-2006 |
Andrew Morton <akpm@osdl.org> |
[PATCH] dm: remove SECTOR_FORMAT We don't know what type sector_t has. Sometimes it's unsigned long, sometimes it's unsigned long long. For example on ppc64 it's unsigned long with CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n. The way to handle all of this is to always use unsigned long long and to always typecast the sector_t when printing it. Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
8ba32fde |
|
17-Mar-2006 |
Kevin Corry <kevcorry@us.ibm.com> |
[PATCH] dm stripe: Fix bounds The dm-stripe target currently does not enforce that the size of a stripe device be a multiple of the chunk-size. Under certain conditions, this can lead to I/O requests going off the end of an underlying device. This test-case shows one example. echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0 echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1 echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \ dmsetup create stripe0 dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k This will produce the output: dd: writing '/dev/mapper/stripe0': Input/output error 97+0 records in 96+0 records out And in the kernel log will be: attempt to access beyond end of device dm-0: rw=0, want=104, limit=100 The patch will check that the table size is a multiple of the stripe chunk-size when the table is created, which will prevent the above striped device from being created. This should not affect tools like LVM or EVMS, since in all the cases I can think of, striped devices are always created with the sizes being a multiple of the chunk-size. The size of a stripe device must be a multiple of its chunk-size. (akpm: that typecast is quite gratuitous) Signed-off-by: Kevin Corry <kevcorry@us.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|