#
396799eb |
|
03-Mar-2024 |
Christoph Hellwig <hch@lst.de> |
md: remove mddev->queue Just use the request_queue from the gendisk pointer in the relatively few places that sill need it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed--by: Song Liu <song@kernel.org> Tested-by: Song Liu <song@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240303140150.5435-11-hch@lst.de
|
#
ad860670 |
|
25-Nov-2023 |
Yu Kuai <yukuai3@huawei.com> |
md/raid5: remove rcu protection to access rdev from conf Because it's safe to accees rdev from conf: - If any spinlock is held, because synchronize_rcu() from md_kick_rdev_from_array() will prevent 'rdev' to be freed until spinlock is released; - If 'reconfig_lock' is held, because rdev can't be added or removed from array; - If there is normal IO inflight, because mddev_suspend() will prevent rdev to be added or removed from array; - If there is sync IO inflight, because 'MD_RECOVERY_RUNNING' is checked in remove_and_add_spares(). And these will cover all the scenarios in raid456. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20231125081604.3939938-5-yukuai1@huaweicloud.com
|
#
6eea4ff8 |
|
31-May-2023 |
Johannes Thumshirn <johannes.thumshirn@wdc.com> |
md: raid5: use __bio_add_page to add single page to new bio The raid5-ppl submission code uses bio_add_page() to add a page to a newly created bio. bio_add_page() can fail, but the return value is never checked. For adding consecutive pages, the return is actually checked and a new bio is allocated if adding the page fails. Use __bio_add_page() as adding a single page to a newly created bio is guaranteed to succeed. This brings us a step closer to marking bio_add_page() as __must_check. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/27e6bcd762354bff74602e89159cdd12ae3d1fa9.1685532726.git.johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ad831a16 |
|
09-Nov-2022 |
Christoph Hellwig <hch@lst.de> |
md/raid5: use bdev_write_cache instead of open coding it Use the bdev_write_cache instead of two equivalent open coded checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
|
#
e0fccdaf |
|
08-Jun-2022 |
Logan Gunthorpe <logang@deltatee.com> |
md/raid5-ppl: Drop unused argument from ppl_handle_flush_request() ppl_handle_flush_request() takes an struct r5log argument but doesn't use it. It has no buisiness taking this argument as it is only used by raid5-cache and has no way to derference it anyway. Remove the argument. No functional changes intended. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4ce4c73f |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
md/core: Combine two sync_page_io() arguments Improve uniformity in the kernel of handling of request operation and flags by passing these as a single argument. Cc: Song Liu <song@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-32-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f34fdcd4 |
|
08-Jun-2022 |
Logan Gunthorpe <logang@deltatee.com> |
md/raid5-ppl: Fix argument order in bio_alloc_bioset() bio_alloc_bioset() takes a block device, number of vectors, the OP flags, the GFP mask and the bio set. However when the prototype was changed, the callisite in ppl_do_flush() had the OP flags and the GFP flags reversed. This introduced some sparse error: drivers/md/raid5-ppl.c:632:57: warning: incorrect type in argument 3 (different base types) drivers/md/raid5-ppl.c:632:57: expected unsigned int opf drivers/md/raid5-ppl.c:632:57: got restricted gfp_t [usertype] drivers/md/raid5-ppl.c:633:61: warning: incorrect type in argument 4 (different base types) drivers/md/raid5-ppl.c:633:61: expected restricted gfp_t [usertype] gfp_mask drivers/md/raid5-ppl.c:633:61: got unsigned long long The sparse error introduction may not have been reported correctly by 0day due to other work that was cleaning up other sparse errors in this area. Fixes: 609be1066731 ("block: pass a block_device and opf to bio_alloc_bioset") Cc: stable@vger.kernel.org # 5.18+ Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
|
#
913cce5a |
|
12-May-2022 |
Christoph Hellwig <hch@lst.de> |
md: remove most calls to bdevname Use the %pg format specifier to save on stack consumption and code size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
|
#
4f4ee2bf |
|
07-Apr-2022 |
Logan Gunthorpe <logang@deltatee.com> |
md/raid5-ppl: Annotate with rcu_dereference_protected() To suppress the last remaining sparse warnings about accessing rdev, add rcu_dereference_protected calls to a couple places in raid5-ppl. All of these places are called under raid5_run and therefore are occurring before the array has started and is thus safe. There's no sensible check to do for the second argument of rcu_dereference_protected() so a comment is added instead. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
|
#
c75e707f |
|
04-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove the per-bio/request write hint With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
9f7c3f83 |
|
28-Feb-2022 |
Christoph Hellwig <hch@lst.de> |
raid5-ppl: fully initialize the bio in ppl_new_iounit We have all the information to pass the bdev and op directly to bio_init, so do that. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
|
#
c7dec462 |
|
04-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
raid5-ppl: stop using bio_devname Use the %pg format specifier to save on stack consuption and code size. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220304180105.409765-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
49add496 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a block_device and opf to bio_init Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
609be106 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a block_device and opf to bio_alloc_bioset Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1e37799b |
|
28-Oct-2021 |
Yang Guang <yang.guang5@zte.com.cn> |
raid5-ppl: use swap() to make code cleaner Use the macro `swap()` defined in `include/linux/minmax.h` to avoid opencoding it. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
|
#
a8affc03 |
|
10-Mar-2021 |
Christoph Hellwig <hch@lst.de> |
block: rename BIO_MAX_PAGES to BIO_MAX_VECS Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c6bf3f0e |
|
26-Jan-2021 |
Christoph Hellwig <hch@lst.de> |
block: use an on-stack bio in blkdev_issue_flush There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c911c46c |
|
18-Jul-2020 |
Yufen Yu <yuyufen@huawei.com> |
md/raid456: convert macro STRIPE_* to RAID5_STRIPE_* Convert macro STRIPE_SIZE, STRIPE_SECTORS and STRIPE_SHIFT to RAID5_STRIPE_SIZE(), RAID5_STRIPE_SECTORS() and RAID5_STRIPE_SHIFT(). This patch is prepare for the following adjustable stripe_size. It will not change any existing functionality. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
|
#
9398554f |
|
13-May-2020 |
Christoph Hellwig <hch@lst.de> |
block: remove the error_sector argument to blkdev_issue_flush The argument isn't used by any caller, and drivers don't fill out bi_sector for flush requests either. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
c593642c |
|
09-Dec-2019 |
Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> |
treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
|
#
0815ef3c |
|
20-Sep-2019 |
Eugene Syromiatnikov <esyr@redhat.com> |
drivers/md/raid5-ppl.c: use the new spelling of RWH_WRITE_LIFE_NOT_SET As it is consistent with prefixes of other write life time hints. Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com>
|
#
2025cf9e |
|
29-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
a596d086 |
|
18-Feb-2019 |
Mariusz Dabrowski <mariusz.dabrowski@intel.com> |
raid5: set write hint for PPL When the Partial Parity Log is enabled, circular buffer is used to store PPL data. Each write to RAID device causes overwrite of data in this buffer so some write_hint can be set to those request to help drives handle garbage collection. This patch adds new sysfs attribute which can be used to specify which write_hint should be assigned to PPL. Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Mariusz Dabrowski <mariusz.dabrowski@intel.com> Signed-off-by: Song Liu <songliubraving@fb.com>
|
#
b330e6a4 |
|
12-Mar-2019 |
Kent Overstreet <kent.overstreet@gmail.com> |
md: convert to kvmalloc The code really just wants a big flat buffer, so just do that. Link: http://lkml.kernel.org/r/20181217131929.11727-3-kent.overstreet@gmail.com Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> Cc: Shaohua Li <shli@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Paris <eparis@parisplace.org> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Pravin B Shelar <pshelar@ovn.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
afeee514 |
|
20-May-2018 |
Kent Overstreet <kent.overstreet@gmail.com> |
md: convert to bioset_init()/mempool_init() Convert md to embedded bio sets. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
f4bc0c81 |
|
20-Feb-2018 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: fix handling flush requests Add missing bio completion. Without this any flush request would hang. Fixes: 1532d9e87e8b ("raid5-ppl: PPL support for disks with write-back cache enabled") Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
|
#
1532d9e8 |
|
27-Dec-2017 |
Tomasz Majchrzak <tomasz.majchrzak@intel.com> |
raid5-ppl: PPL support for disks with write-back cache enabled In order to provide data consistency with PPL for disks with write-back cache enabled all data has to be flushed to disks before next PPL entry. The disks to be flushed are marked in the bitmap. It's modified under a mutex and it's only read after PPL io unit is submitted. A limitation of 64 disks in the array has been introduced to keep data structures and implementation simple. RAID5 arrays with so many disks are not likely due to high risk of multiple disks failure. Such restriction should not be a real life limitation. With write-back cache disabled next PPL entry is submitted when data write for current one completes. Data flush defers next log submission so trigger it when there are no stripes for handling found. As PPL assures all data is flushed to disk at request completion, just acknowledge flush request when PPL is enabled. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
|
#
07719ff7 |
|
29-Sep-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: check recovery_offset when performing ppl recovery If starting an array that is undergoing rebuild, make ppl recovery honor the recovery_offset of a member disk and don't read data that is not yet in-sync. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
611426e2 |
|
29-Sep-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: don't resync after rebuild The check for degraded array is unnecessary and causes a resync to be performed after ppl recovery and rebuild when restarting an array during rebuilding after unclean shutdown. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
675dc2cc |
|
16-Aug-2017 |
Pawel Baldysiak <pawel.baldysiak@intel.com> |
raid5-ppl: Recovery support for multiple partial parity logs Search PPL buffer in order to find out the latest PPL header (the one with largest generation number) and use it for recovery. The PPL entry format and recovery algorithm are the same as for single PPL approach. Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
ddc08823 |
|
16-Aug-2017 |
Pawel Baldysiak <pawel.baldysiak@intel.com> |
md: Runtime support for multiple ppls Increase PPL area to 1MB and use it as circular buffer to store PPL. The entry with highest generation number is the latest one. If PPL to be written is larger then space left in a buffer, rewind the buffer to the start (don't wrap it). Signed-off-by: Pawel Baldysiak <pawel.baldysiak@intel.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
74d46992 |
|
23-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
block: replace bi_bdev with a gendisk pointer and partitions index This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
6409e84e |
|
11-Jul-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: use BIOSET_NEED_BVECS when creating bioset This bioset is used for allocating bios with nr_iovecs > 0 so this flag must be set. Fixes: 011067b05668 ("blk: replace bioset_create_nobvec() with a flags arg to bioset_create()") Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
011067b0 |
|
17-Jun-2017 |
NeilBrown <neilb@suse.com> |
blk: replace bioset_create_nobvec() with a flags arg to bioset_create() "flags" arguments are often seen as good API design as they allow easy extensibility. bioset_create_nobvec() is implemented internally as a variation in flags passed to __bioset_create(). To support future extension, make the internal structure part of the API. i.e. add a 'flags' argument to bioset_create() and discard bioset_create_nobvec(). Note that the bio_split allocations in drivers/md/raid* do not need the bvec mempool - they should have used bioset_create_nobvec(). Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
4e4cbee9 |
|
03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
5a8948f8 |
|
31-May-2017 |
Jan Kara <jack@suse.cz> |
md: Make flush bios explicitely sync Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. CC: linux-raid@vger.kernel.org CC: Shaohua Li <shli@kernel.org> Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
fcd403af |
|
11-Apr-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: use a single mempool for ppl_io_unit and header_page Allocate both struct ppl_io_unit and its header_page from a shared mempool to avoid a possible deadlock. Implement allocate and free functions for the mempool, remove the second pool for allocating header_page. The header_pages are now freed with their io_units, not when the ppl bio completes. Also, use GFP_NOWAIT instead of GFP_ATOMIC for allocating ppl_io_unit because we can handle failed allocations and there is no reason to utilize emergency reserves. Suggested-by: NeilBrown <neilb@suse.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
ae1713e2 |
|
04-Apr-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: partial parity calculation optimization In case of read-modify-write, partial partity is the same as the result of ops_run_prexor5(), so we can just copy sh->dev[pd_idx].page into sh->ppl_page instead of calculating it again. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
845b9e22 |
|
04-Apr-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: use resize_stripes() when enabling or disabling ppl Use resize_stripes() instead of raid5_reset_stripe_cache() to allocate or free sh->ppl_page at runtime for all stripes in the stripe cache. raid5_reset_stripe_cache() required suspending the mddev and could deadlock because of GFP_KERNEL allocations. Move the 'newsize' check to check_reshape() to allow reallocating the stripes with the same number of disks. Allocate sh->ppl_page in alloc_stripe() instead of grow_buffers(). Pass 'struct r5conf *conf' as a parameter to alloc_stripe() because it is needed to check whether to allocate ppl_page. Add free_stripe() and use it to free stripes rather than directly call kmem_cache_free(). Also free sh->ppl_page in free_stripe(). Set MD_HAS_PPL at the end of ppl_init_log() instead of explicitly setting it in advance and add another parameter to log_init() to allow calling ppl_init_log() without the bit set. Don't try to calculate partial parity or add a stripe to log if it does not have ppl_page set. Enabling ppl can now be performed without suspending the mddev, because the log won't be used until new stripes are allocated with ppl_page. Calling mddev_suspend/resume is still necessary when disabling ppl, because we want all stripes to finish before stopping the log, but resize_stripes() can be called after mddev_resume() when ppl is no longer active. Suggested-by: NeilBrown <neilb@suse.com> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
94568f64 |
|
04-Apr-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: move no_mem_stripes to struct ppl_conf Use a single no_mem_stripes list instead of per member device lists for handling stripes that need retrying in case of failed io_unit allocation. Because io_units are allocated from a memory pool shared between all member disks, the no_mem_stripes list should be checked when an io_unit for any member is freed. This fixes a deadlock that could happen if there are stripes in more than one no_mem_stripes list. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
0b408baf |
|
21-Mar-2017 |
Dan Carpenter <dan.carpenter@oracle.com> |
raid5-ppl: silence a misleading warning message The "need_cache_flush" variable is never set to false. When the variable is true that means we print a warning message at the end of the function. Fixes: 3418d036c81d ("raid5-ppl: Partial Parity Log write logging implementation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
ba903a3e |
|
09-Mar-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: runtime PPL enabling or disabling Allow writing to 'consistency_policy' attribute when the array is active. Add a new function 'change_consistency_policy' to the md_personality operations structure to handle the change in the personality code. Values "ppl" and "resync" are accepted and turn PPL on and off respectively. When enabling PPL its location and size should first be set using 'ppl_sector' and 'ppl_size' attributes and a valid PPL header should be written at this location on each member device. Enabling or disabling PPL is performed under a suspended array. The raid5_reset_stripe_cache function frees the stripe cache and allocates it again in order to allocate or free the ppl_pages for the stripes in the stripe cache. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
6358c239 |
|
09-Mar-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: support disk hot add/remove with PPL Add a function to modify the log by removing an rdev when a drive fails or adding when a spare/replacement is activated as a raid member. Removing a disk just clears the child log rdev pointer. No new stripes will be accepted for this child log in ppl_write_stripe() and running io units will be processed without writing PPL to the device. Adding a disk sets the child log rdev pointer and writes an empty PPL header. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
4536bf9b |
|
09-Mar-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: load and recover the log Load the log from each disk when starting the array and recover if the array is dirty. The initial empty PPL is written by mdadm. When loading the log we verify the header checksum and signature. For external metadata arrays the signature is verified in userspace, so here we read it from the header, verifying only if it matches on all disks, and use it later when writing PPL. In addition to the header checksum, each header entry also contains a checksum of its partial parity data. If the header is valid, recovery is performed for each entry until an invalid entry is found. If the array is not degraded and recovery using PPL fully succeeds, there is no need to resync the array because data and parity will be consistent, so in this case resync will be disabled. Due to compatibility with IMSM implementations on other systems, we can't assume that the recovery data block size is always 4K. Writes generated by MD raid5 don't have this issue, but when recovering PPL written in other environments it is possible to have entries with 512-byte sector granularity. The recovery code takes this into account and also the logical sector size of the underlying drives. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|
#
3418d036 |
|
09-Mar-2017 |
Artur Paszkiewicz <artur.paszkiewicz@intel.com> |
raid5-ppl: Partial Parity Log write logging implementation Implement the calculation of partial parity for a stripe and PPL write logging functionality. The description of PPL is added to the documentation. More details can be found in the comments in raid5-ppl.c. Attach a page for holding the partial parity data to stripe_head. Allocate it only if mddev has the MD_HAS_PPL flag set. Partial parity is the xor of not modified data chunks of a stripe and is calculated as follows: - reconstruct-write case: xor data from all not updated disks in a stripe - read-modify-write case: xor old data and parity from all updated disks in a stripe Implement it using the async_tx API and integrate into raid_run_ops(). It must be called when we still have access to old data, so do it when STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is stored into sh->ppl_page. Partial parity is not meaningful for full stripe write and is not stored in the log or used for recovery, so don't attempt to calculate it when stripe has STRIPE_FULL_WRITE. Put the PPL metadata structures to md_p.h because userspace tools (mdadm) will also need to read/write PPL. Warn about using PPL with enabled disk volatile write-back cache for now. It can be removed once disk cache flushing before writing PPL is implemented. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
|