#
217759bb |
|
23-Jan-2024 |
Christian Brauner <brauner@kernel.org> |
xen: port block device access to file Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-11-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
436d3705 |
|
27-Sep-2023 |
Jan Kara <jack@suse.cz> |
xen/blkback: Convert to bdev_open_by_dev() Convert xen/blkback to use bdev_open_by_dev() and pass the handle around. CC: xen-devel@lists.xenproject.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-7-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
cbfac770 |
|
16-Dec-2022 |
Juergen Gross <jgross@suse.com> |
xen/blkback: move blkif_get_x86_*_req() into blkback.c There is no need to have the functions blkif_get_x86_32_req() and blkif_get_x86_64_req() in a header file, as they are used in one place only. So move them into the using source file and drop the inline qualifier. While at it fix some style issues, and simplify the code by variable reusing and using min() instead of open coding it. Instead of using barrier() use READ_ONCE() for avoiding multiple reads of nr_segments. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
e7b4c07d |
|
12-Dec-2022 |
Juergen Gross <jgross@suse.com> |
xen/blkback: simplify free_persistent_gnts() interface The interface of free_persistent_gnts() can be simplified, as there is only a single caller of free_persistent_gnts() and the 2nd and 3rd parameters are easily obtainable via the ring pointer, which is passed as the first parameter anyway. Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
6935321e |
|
12-Dec-2022 |
Juergen Gross <jgross@suse.com> |
xen/blkback: fix white space code style issues Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
6c5412e2 |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
xen-blkback: Use the enum req_op and blk_opf_t types Improve static type checking by using the enum req_op type for request operations and the new blk_opf_t type for request flags. Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-18-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
44abff2c |
|
14-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
93b4e747 |
|
17-Mar-2022 |
Colin Ian King <colin.king@intel.com> |
xen-blkback: remove redundant assignment to variable i Variable i is being assigned a value that is never read, it is being re-assigned later in a for-loop. The assignment is redundant and can be removed. Cleans up clang scan build warning: drivers/block/xen-blkback/blkback.c:934:14: warning: Although the value stored to 'i' is used in the enclosing expression, the value is never actually read from 'i' [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220317234646.78158-1-colin.i.king@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
07888c66 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a block_device and opf to bio_alloc Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
7d8d0c65 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
xen-blkback: bio_alloc can't fail if it is allow to sleep Remove handling of NULL returns from sleeping bio_alloc calls given that those can't fail. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a846738f |
|
26-Mar-2021 |
Jan Beulich <jbeulich@suse.com> |
xen-blkback: don't leak persistent grants from xen_blkbk_map() The fix for XSA-365 zapped too many of the ->persistent_gnt[] entries. Ones successfully obtained should not be overwritten, but instead left for xen_blkbk_unmap_prepare() to pick up and put. This is XSA-371. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
5f7136db |
|
28-Jan-2021 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
block: Add bio_max_segs It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to be unsigned to make it easier for the users. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
871997bc |
|
15-Feb-2021 |
Jan Beulich <jbeulich@suse.com> |
xen-blkback: fix error handling in xen_blkbk_map() The function uses a goto-based loop, which may lead to an earlier error getting discarded by a later iteration. Exit this ad-hoc loop when an error was encountered. The out-of-memory error path additionally fails to fill a structure field looked at by xen_blkbk_unmap_prepare() before inspecting the handle which does get properly set (to BLKBACK_INVALID_HANDLE). Since the earlier exiting from the ad-hoc loop requires the same field filling (invalidation) as that on the out-of-memory path, fold both paths. While doing so, drop the pr_alert(), as extra log messages aren't going to help the situation (the kernel will log oom conditions already anyway). This is XSA-365. Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
5a264285 |
|
15-Feb-2021 |
Jan Beulich <jbeulich@suse.com> |
xen-blkback: don't "handle" error by BUG() In particular -ENOMEM may come back here, from set_foreign_p2m_mapping(). Don't make problems worse, the more that handling elsewhere (together with map's status fields now indicating whether a mapping wasn't even attempted, and hence has to be considered failed) doesn't require this odd way of dealing with errors. This is part of XSA-362. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
ca33479c |
|
07-Dec-2020 |
Juergen Gross <jgross@suse.com> |
xen: add helpers for caching grant mapping pages Instead of having similar helpers in multiple backend drivers use common helpers for caching pages allocated via gnttab_alloc_pages(). Make use of those helpers in blkback and scsiback. Cc: <stable@vger.kernel.org> # 5.9 Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovksy@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
01263a1f |
|
07-Sep-2020 |
Juergen Gross <jgross@suse.com> |
xen/blkback: use lateeoi irq binding In order to reduce the chance for the system becoming unresponsive due to event storms triggered by a misbehaving blkfront use the lateeoi irq binding for blkback and unmask the event channel only after processing all pending requests. As the thread processing requests is used to do purging work in regular intervals an EOI may be sent only after having received an event. If there was no pending I/O request flag the EOI as spurious. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
|
#
df561f66 |
|
23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
823f2091 |
|
27-Jan-2020 |
SeongJae Park <sjpark@amazon.de> |
xen/blkback: Remove unnecessary static variable name prefixes A few of static variables in blkback have 'xen_blkif_' prefix, though it is unnecessary for static variables. This commit removes such prefixes. Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
cb9369bd |
|
27-Jan-2020 |
SeongJae Park <sjpark@amazon.de> |
xen/blkback: Squeeze page pools if a memory pressure is detected Each `blkif` has a free pages pool for the grant mapping. The size of the pool starts from zero and is increased on demand while processing the I/O requests. If current I/O requests handling is finished or 100 milliseconds has passed since last I/O requests handling, it checks and shrinks the pool to not exceed the size limit, `max_buffer_pages`. Therefore, host administrators can cause memory pressure in blkback by attaching a large number of block devices and inducing I/O. Such problematic situations can be avoided by limiting the maximum number of devices that can be attached, but finding the optimal limit is not so easy. Improper set of the limit can results in memory pressure or a resource underutilization. This commit avoids such problematic situations by squeezing the pools (returns every free page in the pool to the system) for a while (users can set this duration via a module parameter) if memory pressure is detected. Discussions =========== The `blkback`'s original shrinking mechanism returns only pages in the pool which are not currently be used by `blkback` to the system. In other words, the pages that are not mapped with granted pages. Because this commit is changing only the shrink limit but still uses the same freeing mechanism it does not touch pages which are currently mapping grants. Once memory pressure is detected, this commit keeps the squeezing limit for a user-specified time duration. The duration should be neither too long nor too short. If it is too long, the squeezing incurring overhead can reduce the I/O performance. If it is too short, `blkback` will not free enough pages to reduce the memory pressure. This commit sets the value as `10 milliseconds` by default because it is a short time in terms of I/O while it is a long time in terms of memory operations. Also, as the original shrinking mechanism works for at least every 100 milliseconds, this could be a somewhat reasonable choice. I also tested other durations (refer to the below section for more details) and confirmed that 10 milliseconds is the one that works best with the test. That said, the proper duration depends on actual configurations and workloads. That's why this commit allows users to set the duration as a module parameter. Memory Pressure Test ==================== To show how this commit fixes the memory pressure situation well, I configured a test environment on a xen-running virtualization system. On the `blkfront` running guest instances, I attach a large number of network-backed volume devices and induce I/O to those. Meanwhile, I measure the number of pages that swapped in (pswpin) and out (pswpout) on the `blkback` running guest. The test ran twice, once for the `blkback` before this commit and once for that after this commit. As shown below, this commit has dramatically reduced the memory pressure: pswpin pswpout before 76,672 185,799 after 867 3,967 Optimal Aggressive Shrinking Duration ------------------------------------- To find a best squeezing duration, I repeated the test with three different durations (1ms, 10ms, and 100ms). The results are as below: duration pswpin pswpout 1 707 5,095 10 867 3,967 100 362 3,348 As expected, the memory pressure decreases as the duration increases, but the reduction become slow from the `10ms`. Based on this results, I chose the default duration as 10ms. Performance Overhead Test ========================= This commit could incur I/O performance degradation under severe memory pressure because the squeezing will require more page allocations per I/O. To show the overhead, I artificially made a worst-case squeezing situation and measured the I/O performance of a `blkfront` running guest. For the artificial squeezing, I set the `blkback.max_buffer_pages` using the `/sys/module/xen_blkback/parameters/max_buffer_pages` file. In this test, I set the value to `1024` and `0`. The `1024` is the default value. Setting the value as `0` is same to a situation doing the squeezing always (worst-case). If the underlying block device is slow enough, the squeezing overhead could be hidden. For the reason, I use a fast block device, namely the rbd[1]: # xl block-attach guest phy:/dev/ram0 xvdb w For the I/O performance measurement, I run a simple `dd` command 5 times directly to the device as below and collect the 'MB/s' results. $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \ bs=4k count=$((256*512)); sync; done The results are as below. 'max_pgs' represents the value of the `blkback.max_buffer_pages` parameter. max_pgs Min Max Median Avg Stddev 0 417 423 420 419.4 2.5099801 1024 414 425 416 417.8 4.4384682 No difference proven at 95.0% confidence In short, even worst case squeezing on ramdisk based fast block device makes no visible performance degradation. Please note that this is just a very simple and minimal test. On systems using super-fast block devices and a special I/O workload, the results might be different. If you have any doubt, test on your machine with your workload to find the optimal squeezing duration for you. [1] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.html Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
#
14855954 |
|
02-Dec-2019 |
Paul Durrant <pdurrant@amazon.com> |
xen-blkback: allow module to be cleanly unloaded Add a module_exit() to perform the necessary clean-up. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
|
#
f9bd84a8 |
|
26-Nov-2019 |
SeongJae Park <sjpark@amazon.de> |
xen/blkback: Avoid unmapping unmapped grant pages For each I/O request, blkback first maps the foreign pages for the request to its local pages. If an allocation of a local page for the mapping fails, it should unmap every mapping already made for the request. However, blkback's handling mechanism for the allocation failure does not mark the remaining foreign pages as unmapped. Therefore, the unmap function merely tries to unmap every valid grant page for the request, including the pages not mapped due to the allocation failure. On a system that fails the allocation frequently, this problem leads to following kernel crash. [ 372.012538] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [ 372.012546] IP: [<ffffffff814071ac>] gnttab_unmap_refs.part.7+0x1c/0x40 [ 372.012557] PGD 16f3e9067 PUD 16426e067 PMD 0 [ 372.012562] Oops: 0002 [#1] SMP [ 372.012566] Modules linked in: act_police sch_ingress cls_u32 ... [ 372.012746] Call Trace: [ 372.012752] [<ffffffff81407204>] gnttab_unmap_refs+0x34/0x40 [ 372.012759] [<ffffffffa0335ae3>] xen_blkbk_unmap+0x83/0x150 [xen_blkback] ... [ 372.012802] [<ffffffffa0336c50>] dispatch_rw_block_io+0x970/0x980 [xen_blkback] ... Decompressing Linux... Parsing ELF... done. Booting the kernel. [ 0.000000] Initializing cgroup subsys cpuset This commit fixes this problem by marking the grant pages of the given request that didn't mapped due to the allocation failure as invalid. Fixes: c6cc142dac52 ("xen-blkback: use balloon pages for all mappings") Reviewed-by: David Woodhouse <dwmw@amazon.de> Reviewed-by: Maximilian Heyne <mheyne@amazon.de> Reviewed-by: Paul Durrant <pdurrant@amazon.co.uk> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
d77ff24e |
|
13-Aug-2018 |
Juergen Gross <jgross@suse.com> |
xen/blkback: move persistent grants flags to bool The struct persistent_gnt flags member is meant to be a bitfield of different flags. There is only PERSISTENT_GNT_ACTIVE flag left, so convert it to a bool named "active". Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
973e5405 |
|
13-Aug-2018 |
Juergen Gross <jgross@suse.com> |
xen/blkback: don't keep persistent grants too long Persistent grants are allocated until a threshold per ring is being reached. Those grants won't be freed until the ring is being destroyed meaning there will be resources kept busy which might no longer be used. Instead of freeing only persistent grants until the threshold is reached add a timestamp and remove all persistent grants not having been in use for a minute. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5657a819 |
|
24-May-2018 |
Joe Perches <joe@perches.com> |
block drivers/block: Use octal not symbolic permissions Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
74d46992 |
|
23-Aug-2017 |
Christoph Hellwig <hch@lst.de> |
block: replace bi_bdev with a gendisk pointer and partitions index This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
3f2c9405 |
|
17-Aug-2017 |
Bart Van Assche <bvanassche@acm.org> |
xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monn303251 <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
306b82a8 |
|
17-Aug-2017 |
Bart Van Assche <bvanassche@acm.org> |
xen-blkback: Fix indentation Avoid that smatch reports the following warning when building with C=2 CHECK="smatch -p=kernel": drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roger Pau Monn303251 <roger.pau@citrix.com> Cc: xen-devel@lists.xenproject.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
089bc014 |
|
13-Jun-2017 |
Jan Beulich <jbeulich@suse.com> |
xen-blkback: don't leak stack data via response ring Rather than constructing a local structure instance on the stack, fill the fields directly on the shared ring, just like other backends do. Build on the fact that all response structure flavors are actually identical (the old code did make this assumption too). This is XSA-216. Cc: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a24fa22c |
|
18-May-2017 |
Juergen Gross <jgross@suse.com> |
xen/blkback: don't use xen_blkif_get() in xen-blkback kthread There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread of xen-blkback. Thread stopping is synchronous and using the blkif reference counting in the kthread will avoid to ever let the reference count drop to zero at the end of an I/O running concurrent to disconnecting and multiple rings. Setting ring->xenblkd to NULL after stopping the kthread isn't needed as the kthread does this already. Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Steven Haigh <netwiz@crc.id.au> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4e4cbee9 |
|
03-Jun-2017 |
Christoph Hellwig <hch@lst.de> |
block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
70fd7614 |
|
01-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
a022606e |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
xen: use bio op accessors Separate the op from the rq_flag_bits and have xen set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
4e49ea4a |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
block/fs/drivers: remove rw argument from submit_bio This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
db6fbc10 |
|
08-Dec-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/blkback: make st_ statistics per ring Make st_* statistics per ring and the VBD sysfs would iterate over all the rings. Note: xenvbd_sysfs_delif() is called in xen_blkbk_remove() before all rings are torn down, so it's safe. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Aligned the variables on the same column.
|
#
a6e7af12 |
|
25-Oct-2015 |
Jiri Kosina <jkosina@suse.cz> |
xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule() xen_blkif_schedule() kthread calls try_to_freeze() at the beginning of every attempt to purge the LRU. This operation can't ever succeed though, as the kthread hasn't marked itself as freezable. Before (hopefully eventually) kthread freezing gets converted to fileystem freezing, we'd rather mark xen_blkif_schedule() freezable (as it can generate I/O during suspend). Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
d4bf0065 |
|
13-Nov-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/blkback: make pool of persistent grants and free pages per-queue Make pool of persistent grants and free pages per-queue/ring instead of per-device to get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Results: iops1: After patch "xen/blkfront: make persistent grants per-queue". iops2: After this patch. Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops1(k): 810 1230(~20%) 1024(~20%) 850(~20%) Iops2(k): 810 1410(~35%) 1354(~75%) 1440(~100%) With 4 queues after this commit we can get ~75% increase in IOPS, and performance won't drop if increasing queue numbers. Please find the respective chart in this link: https://www.dropbox.com/s/agrcy2pbzbsvmwv/iops.png?dl=0 Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
d62d8600 |
|
13-Nov-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/blkback: get the number of hardware queues/rings from blkfront Backend advertises "multi-queue-max-queues" to front, also get the negotiated number from "multi-queue-num-queues" written by blkfront. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
59795700 |
|
13-Nov-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/blkback: separate ring information out of struct xen_blkif Split per ring information to an new structure "xen_blkif_ring", so that one vbd device can be associated with one or more rings/hardware queues. Introduce 'pers_gnts_lock' to protect the pool of persistent grants since we may have multi backend threads. This patch is a preparation for supporting multi hardware queues/rings. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- v2: Align the variables in the structure.
|
#
18779149 |
|
03-Nov-2015 |
Roger Pau Monné <roger.pau@citrix.com> |
xen-blkback: read from indirect descriptors only once Since indirect descriptors are in memory shared with the frontend, the frontend could alter the first_sect and last_sect values after they have been validated but before they are recorded in the request. This may result in I/O requests that overflow the foreign page, possibly overwriting local pages when the I/O request is executed. When parsing indirect descriptors, only read first_sect and last_sect once. This is part of XSA155. CC: stable@vger.kernel.org Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
9cce2914 |
|
13-Oct-2015 |
Julien Grall <julien.grall@citrix.com> |
xen/xenbus: Rename *RING_PAGE* to *RING_GRANT* Linux may use a different page size than the size of grant. So make clear that the order is actually in number of grant. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
67de5dfb |
|
05-May-2015 |
Julien Grall <julien.grall@citrix.com> |
block/xen-blkback: Make it running on 64KB page granularity The PV block protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity behaving as a block backend on a non-modified Xen. It's only necessary to adapt the ring size and the number of request per indirect frames. The rest of the code is relying on the grant table code. Note that the grant table code is allocating a Linux page per grant which will result to waste 6OKB for every grant when Linux is using 64KB page granularity. This could be improved by sharing the page between multiple grants. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
4246a0b6 |
|
20-Jul-2015 |
Christoph Hellwig <hch@lst.de> |
block: add a bi_error field to struct bio Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
53bc7dc0 |
|
22-Jul-2015 |
Bob Liu <bob.liu@oracle.com> |
xen-blkback: replace work_pending with work_busy in purge_persistent_gnt() The BUG_ON() in purge_persistent_gnt() will be triggered when previous purge work haven't finished. There is a work_pending() before this BUG_ON, but it doesn't account if the work is still currently running. CC: stable@vger.kernel.org Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
6684fa1c |
|
17-Jun-2015 |
Julien Grall <julien.grall@linaro.org> |
block/xen-blkback: s/nr_pages/nr_segs/ Make the code less confusing to read now that Linux may not have the same page size as Xen. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
86839c56 |
|
02-Jun-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/block: add multi-page ring support Extend xen/block to support multi-page ring, so that more requests can be issued by using more than one pages as the request ring between blkfront and backend. As a result, the performance can get improved significantly. We got some impressive improvements on our highend iscsi storage cluster backend. If using 64 pages as the ring, the IOPS increased about 15 times for the throughput testing and above doubled for the latency testing. The reason was the limit on outstanding requests is 32 if use only one-page ring, but in our case the iscsi lun was spread across about 100 physical drives, 32 was really not enough to keep them busy. Changes in v2: - Rebased to 4.0-rc6. - Document on how multi-page ring feature working to linux io/blkif.h. Changes in v3: - Remove changes to linux io/blkif.h and follow the protocol defined in io/blkif.h of XEN tree. - Rebased to 4.1-rc3 Changes in v4: - Turn to use 'ring-page-order' and 'max-ring-page-order'. - A few comments from Roger. Changes in v5: - Clarify with 4k granularity to comment - Address more comments from Roger Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b44166cd |
|
03-Apr-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/grant: introduce func gnttab_unmap_refs_sync() There are several place using gnttab async unmap and wait for completion, so move the common code to a function gnttab_unmap_refs_sync(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
325d73bf |
|
03-Apr-2015 |
Bob Liu <bob.liu@oracle.com> |
xen/blkback: safely unmap purge persistent grants Commit c43cf3ea8385 ("xen-blkback: safely unmap grants in case they are still in use") use gnttab_unmap_refs_async() to wait until the mapped pages are no longer in use before unmapping them, but that commit missed the persistent case. Purge persistent pages can't be unmapped either unless no longer in use. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
77387b82 |
|
01-Apr-2015 |
Tao Chen <boby.chen@huawei.com> |
xen-blkback: define pr_fmt macro to avoid the duplication of DRV_PFX Define pr_fmt macro with {xen-blkback: } prefix, then remove all use of DRV_PFX in the pr sentences. Replace all DPRINTK with pr sentences, and get rid of DPRINTK macro. It will simplify the code. And if the pr sentences miss a \n, add it in the end. If the DPRINTK sentences have redundant \n, remove it. It will format the code. These all make the readability of the code become better. Signed-off-by: Tao Chen <boby.chen@huawei.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
|
#
c43cf3ea |
|
05-Jan-2015 |
Jennifer Herbert <jennifer.herbert@citrix.com> |
xen-blkback: safely unmap grants in case they are still in use Use gnttab_unmap_refs_async() to wait until the mapped pages are no longer in use before unmapping them. This allows blkback to use network storage which may retain refs to pages in queued skbs after the block I/O has completed. Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jens Axboe <axboe@kernel.de> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|
#
ff4b156f |
|
08-Jan-2015 |
David Vrabel <david.vrabel@citrix.com> |
xen/grant-table: add helpers for allocating pages Add gnttab_alloc_pages() and gnttab_free_pages() to allocate/free pages suitable to for granted maps. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
|
#
61cecca8 |
|
15-Sep-2014 |
Roger Pau Monné <roger.pau@citrix.com> |
xen-blkback: fix leak on grant map error path Fix leaking a page when a grant mapping has failed. CC: stable@vger.kernel.org Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-and-Tested-by: Tao Chen <boby.chen@huawei.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
abb97b8c |
|
11-Feb-2014 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: init persistent_purge_work work_struct Initialize persistent_purge_work work_struct on xen_blkif_alloc (and remove the previous initialization done in purge_persistent_gnt). This prevents flush_work from complaining even if purge_persistent_gnt has not been used. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
80bfa2f6 |
|
04-Feb-2014 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkif: drop struct blkif_request_segment_aligned This was wrongly introduced in commit 402b27f9, the only difference between blkif_request_segment_aligned and blkif_request_segment is that the former has a named padding, while both share the same memory layout. Also correct a few minor glitches in the description, including for it to no longer assume PAGE_SIZE == 4096. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> [Description fix by Jan Beulich] Signed-off-by: Jan Beulich <jbeulich@suse.com> Reported-by: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Matt Rushton <mrushton@amazon.com> Cc: Matt Wilson <msw@amazon.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c05f3e3c |
|
04-Feb-2014 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: fix shutdown race Introduce a new variable to keep track of the number of in-flight requests. We need to make sure that when xen_blkif_put is called the request has already been freed and we can safely free xen_blkif, which was not the case before. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Matt Rushton <mrushton@amazon.com> Reviewed-by: Matt Rushton <mrushton@amazon.com> Cc: Matt Wilson <msw@amazon.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ef753411 |
|
04-Feb-2014 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: fix memory leaks I've at least identified two possible memory leaks in blkback, both related to the shutdown path of a VBD: - blkback doesn't wait for any pending purge work to finish before cleaning the list of free_pages. The purge work will call put_free_pages and thus we might end up with pages being added to the free_pages list after we have emptied it. Fix this by making sure there's no pending purge work before exiting xen_blkif_schedule, and moving the free_page cleanup code to xen_blkif_free. - blkback doesn't wait for pending requests to end before cleaning persistent grants and the list of free_pages. Again this can add pages to the free_pages list or persistent grants to the persistent_gnts red-black tree. Fixed by moving the persistent grants and free_pages cleanup code to xen_blkif_free. Also, add some checks in xen_blkif_free to make sure we are cleaning everything. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Matt Rushton <mrushton@amazon.com> Reviewed-by: Matt Rushton <mrushton@amazon.com> Cc: Matt Wilson <msw@amazon.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
2ed22e3c |
|
04-Feb-2014 |
Matt Rushton <mrushton@amazon.com> |
xen-blkback: fix memory leak when persistent grants are used Currently shrink_free_pagepool() is called before the pages used for persistent grants are released via free_persistent_gnts(). This results in a memory leak when a VBD that uses persistent grants is torn down. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xen.org Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: Matt Rushton <mrushton@amazon.com> Signed-off-by: Matt Wilson <msw@amazon.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
e85fc980 |
|
03-Feb-2014 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen/grant-table: Avoid m2p_override during mapping" This reverts commit 08ece5bb2312b4510b161a6ef6682f37f4eac8a1. As it breaks ARM builds and needs more attention on the ARM side. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
08ece5bb |
|
23-Jan-2014 |
Zoltan Kiss <zoltan.kiss@citrix.com> |
xen/grant-table: Avoid m2p_override during mapping The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by: David Vrabel <david.vrabel@citrix.com> Acked-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4f024f37 |
|
11-Oct-2013 |
Kent Overstreet <kmo@daterainc.com> |
block: Abstract out bvec iterator Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
|
#
ea5ec76d |
|
05-Sep-2013 |
Vegard Nossum <vegard.nossum@oracle.com> |
xen/blkback: fix reference counting If the permission check fails, we drop a reference to the blkif without having taken it in the first place. The bug was introduced in commit 604c499cbbcc3d5fe5fb8d53306aa0fae1990109 (xen/blkback: Check device permissions before allowing OP_DISCARD). Cc: stable@vger.kernel.org Cc: Jan Beulich <JBeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1e0f7a21 |
|
22-Jun-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: check the number of iovecs before allocating a bios With the introduction of indirect segments we can receive requests with a number of segments bigger than the maximum number of allowed iovecs in a bios, so make sure that blkback doesn't try to allocate a bios with more iovecs than BIO_MAX_PAGES Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
2d910543 |
|
20-Jun-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: workaround compiler bug in gcc 4.1 The code generat with gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-54) creates an unbound loop for the second foreach_grant_safe loop in purge_persistent_gnt. The workaround is to avoid having this second loop and instead perform all the work inside the first loop by adding a new variable, clean_used, that will be set when all the desired persistent grants have been removed and we need to iterate over the remaining ones to remove the WAS_ACTIVE flag. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Tom O'Neill <toneill@vmem.com> Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8e3f8755 |
|
23-Jan-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Check for insane amounts of request on the ring (v6). Check that the ring does not have an insane amount of requests (more than there could fit on the ring). If we detect this case we will stop processing the requests and wait until the XenBus disconnects the ring. The existing check RING_REQUEST_CONS_OVERFLOW which checks for how many responses we have created in the past (rsp_prod_pvt) vs requests consumed (req_cons) and whether said difference is greater or equal to the size of the ring, does not catch this case. Wha the condition does check if there is a need to process more as we still have a backlog of responses to finish. Note that both of those values (rsp_prod_pvt and req_cons) are not exposed on the shared ring. To understand this problem a mini crash course in ring protocol response/request updates is in place. There are four entries: req_prod and rsp_prod; req_event and rsp_event to track the ring entries. We are only concerned about the first two - which set the tone of this bug. The req_prod is a value incremented by frontend for each request put on the ring. Conversely the rsp_prod is a value incremented by the backend for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when pushing the responses on the ring). Both values can wrap and are modulo the size of the ring (in block case that is 32). Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details. The culprit here is that if the difference between the req_prod and req_cons is greater than the ring size we have a problem. Fortunately for us, the '__do_block_io_op' loop: rc = blk_rings->common.req_cons; rp = blk_rings->common.sring->req_prod; while (rc != rp) { .. blk_rings->common.req_cons = ++rc; /* before make_response() */ } will loop up to the point when rc == rp. The macros inside of the loop (RING_GET_REQUEST) is smart and is indexing based on the modulo of the ring size. If the frontend has provided a bogus req_prod value we will loop until the 'rc == rp' - which means we could be processing already processed requests (or responses) often. The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is b/c it only tracks how many responses we have internally produced and whether we would should process more. The astute reader will notice that the macro RING_REQUEST_CONS_OVERFLOW provides two arguments - more on this later. For example, if we were to enter this function with these values: blk_rings->common.sring->req_prod = X+31415 (X is the value from the last time __do_block_io_op was called). blk_rings->common.req_cons = X blk_rings->common.rsp_prod_pvt = X The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons) is doing: req_cons - rsp_prod_pvt >= 32 Which is, X - X >= 32 or 0 >= 32 And that is false, so we continue on looping (this bug). If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp instead (sring->req_prod) of rc, the this macro can do the check: req_prod - rsp_prov_pvt >= 32 Which is, X + 31415 - X >= 32 , or 31415 >= 32 which is true, so we can error out and break out of the function. Unfortunatly the difference between rsp_prov_pvt and req_prod can be at 32 (which would error out in the macro). This condition exists when the backend is lagging behind with the responses and still has not finished responding to all of them (so make_response has not been called), and the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able to use said macro. Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does a simple check of: req_prod - rsp_prod_pvt > RING_SIZE And with the X values from above: X + 31415 - X > 32 Returns true. Also not that if the ring is full (which is where the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the same condition: X + 32 - X > 32 Which is false. Lets use that macro. Note that in v5 of this patchset the macro was different - we used an earlier version. Cc: stable@vger.kernel.org [v1: Move the check outside the loop] [v2: Add a pr_warn as suggested by David] [v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan] [v4: Move wake_up after kthread_stop as suggested by Jan] [v5: Use RING_REQUEST_PROD_OVERFLOW instead] [v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> gadsa
|
#
604c499c |
|
16-Jan-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Check device permissions before allowing OP_DISCARD We need to make sure that the device is not RO or that the request is not past the number of sectors we want to issue the DISCARD operation for. This fixes CVE-2013-2140. Cc: stable@vger.kernel.org Acked-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> [v1: Made it pr_warn instead of pr_debug] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
bb642e83 |
|
02-May-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: allocate list of pending reqs in small chunks Allocate pending requests in smaller chunks instead of allocating them all at the same time. This change also removes the global array of pending_reqs, it is no longer necessay. Variables related to the grant mapping have been grouped into a struct called "grant_page", this allows to allocate them in smaller chunks, and also improves memory locality. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
402b27f9 |
|
18-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-block: implement indirect descriptors Indirect descriptors introduce a new block operation (BLKIF_OP_INDIRECT) that passes grant references instead of segments in the request. This grant references are filled with arrays of blkif_request_segment_aligned, this way we can send more segments in a request. The proposed implementation sets the maximum number of indirect grefs (frames filled with blkif_request_segment_aligned) to 256 in the backend and 32 in the frontend. The value in the frontend has been chosen experimentally, and the backend value has been set to a sane value that allows expanding the maximum number of indirect descriptors in the frontend if needed. The migration code has changed from the previous implementation, in which we simply remapped the segments on the shared ring. Now the maximum number of segments allowed in a request can change depending on the backend, so we have to requeue all the requests in the ring and in the queue and split the bios in them if they are bigger than the new maximum number of segments. [v2: Fixed minor comments by Konrad. [v1: Added padding to make the indirect request 64bit aligned. Added some BUGs, comments; fixed number of indirect pages in blkif_get_x86_{32/64}_req. Added description about the indirect operation in blkif.h] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> [v3: Fixed spaces and tabs mix ups] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
31552ee3 |
|
17-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: expand map/unmap functions Preparatory change for implementing indirect descriptors. Change xen_blkbk_{map/unmap} in order to be able to map/unmap a random amount of grants (previously it was limited to BLKIF_MAX_SEGMENTS_PER_REQUEST). Also, remove the usage of pending_req in the map/unmap functions, so we can map/unmap grants without needing to pass a pending_req. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
bf0720c4 |
|
17-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: make the queue of free requests per backend Remove the last dependency from blkbk by moving the list of free requests to blkif. This change reduces the contention on the list of available requests. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
bb6acb28 |
|
17-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: move pending handles list from blkbk to pending_req Moving grant ref handles from blkbk to pending_req will allow us to get rid of the shared blkbk structure. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
3f3aad5e |
|
17-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: implement LRU mechanism for persistent grants This mechanism allows blkback to change the number of grants persistently mapped at run time. The algorithm uses a simple LRU mechanism that removes (if needed) the persistent grants that have not been used since the last LRU run, or if all grants have been used it removes the first grants in the list (that are not in use). The algorithm allows the user to change the maximum number of persistent grants, by changing max_persistent_grants in sysfs. Since we are storing the persistent grants used inside the request struct (to be able to mark them as "unused" when unmapping), we no longer need the bitmap (unmap_seg). Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c6cc142d |
|
17-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: use balloon pages for all mappings Using balloon pages for all granted pages allows us to simplify the logic in blkback, especially in the xen_blkbk_map function, since now we can decide if we want to map a grant persistently or not after we have actually mapped it. This could not be done before because persistent grants used ballooned pages, whereas non-persistent grants used pages from the kernel. This patch also introduces several changes, the first one is that the list of free pages is no longer global, now each blkback instance has it's own list of free pages that can be used to map grants. Also, a run time parameter (max_buffer_pages) has been added in order to tune the maximum number of free pages each blkback instance will keep in it's buffer. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
c1a15d08 |
|
17-Apr-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: print stats about persistent grants Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ffb1dabd |
|
18-Mar-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: don't store dev_bus_addr dev_bus_addr returned in the grant ref map operation is the mfn of the passed page, there's no need to store it in the persistent grant entry, since we can always get it provided that we have the page. This reduces the memory overhead of persistent grants in blkback. While at it, rename the 'seg[i].buf' to be 'seg[i].offset' as it makes much more sense - as we use that value in bio_add_page which as the fourth argument expects the offset. We hadn't used the physical address as part of this at all. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xen.org [v1: s/buf/offset/] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
217fd5e7 |
|
18-Mar-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: fix foreach_grant_safe to handle empty lists We may use foreach_grant_safe in the future with empty lists, so make sure we can handle them. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: xen-devel@lists.xen.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0e5e098a |
|
11-Mar-2013 |
Jan Beulich <JBeulich@suse.com> |
xen-blkback: fix dispatch_rw_block_io() error path Commit 7708992 ("xen/blkback: Seperate the bio allocation and the bio submission") consolidated the pendcnt updates to just a single write, neglecting the fact that the error path relied on it getting set to 1 up front (such that the decrement in __end_block_io_op() would actually drop the count to zero, triggering the necessary cleanup actions). Also remove a misleading and a stale (after said commit) comment. CC: stable@vger.kernel.org Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
986cacbd |
|
11-Mar-2013 |
Zoltan Kiss <zoltan.kiss@citrix.com> |
xen/blkback: Change statistics counter types to unsigned These values shouldn't be negative, but after an overflow their value can turn into negative, if they are signed. xentop can show bogus values in this case. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Reported-by: Ichiro Ogino <ichiro.ogino@citrix.co.jp> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0e367ae4 |
|
07-Mar-2013 |
David Vrabel <david.vrabel@citrix.com> |
xen/blkback: correctly respond to unknown, non-native requests If the frontend is using a non-native protocol (e.g., a 64-bit frontend with a 32-bit backend) and it sent an unrecognized request, the request was not translated and the response would have the incorrect ID. This may cause the frontend driver to behave incorrectly or crash. Since the ID field in the request is always in the same place, regardless of the request type we can get the correct ID and make a valid response (which will report BLKIF_RSP_EOPNOTSUPP). This bug affected 64-bit SLES 11 guests when using a 32-bit backend. This guest does a BLKIF_OP_RESERVED_1 (BLKIF_OP_PACKET in the SLES source) and would crash in blkif_int() as the ID in the response would be invalid. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: stable@vger.kernel.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a72d9002 |
|
27-Feb-2013 |
Chen Gang <gang.chen@asianux.com> |
xen/xen-blkback: preq.dev is used without initialized If call xen_vbd_translate failed, the preq.dev will be not initialized. Use blkif->vbd.pdevice instead (still better to print relative info). Note that before commit 01c681d4c70d64cb72142a2823f27c4146a02e63 (xen/blkback: Don't trust the handle from the frontend.) the value bogus, as it was the guest provided value from req->u.rw.handle rather than the actual device. Signed-off-by: Chen Gang <gang.chen@asianux.com> Acked-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
087ffecd |
|
14-Feb-2013 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: use balloon pages for persistent grants With current persistent grants implementation we are not freeing the persistent grants after we disconnect the device. Since grant map operations change the mfn of the allocated page, and we can no longer pass it to __free_page without setting the mfn to a sane value, use balloon grant pages instead, as the gntdev device does. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: stable@vger.kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
01c681d4 |
|
16-Jan-2013 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Don't trust the handle from the frontend. The 'handle' is the device that the request is from. For the life-time of the ring we copy it from a request to a response so that the frontend is not surprised by it. But we do not need it - when we start processing I/Os we have our own 'struct phys_req' which has only most essential information about the request. In fact the 'vbd_translate' ends up over-writing the preq.dev with a value from the backend. This assignment of preq.dev with the 'handle' value is superfluous so lets not do it. Cc: stable@vger.kernel.org Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
7dc34117 |
|
04-Dec-2012 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: implement safe iterator for the list of persistent grants Change foreach_grant iterator to a safe version, that allows freeing the element while iterating. Also move the free code in free_persistent_gnts to prevent freeing the element before the rb_next call. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: xen-devel@lists.xen.org Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4d4f270f |
|
16-Nov-2012 |
Roger Pau Monne <roger.pau@citrix.com> |
xen-blkback: move free persistent grants code Move the code that frees persistent grants from the red-black tree to a function. This will make it easier for other consumers to move this to a common place. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
cb5bd4d1 |
|
02-Nov-2012 |
Roger Pau Monne <roger.pau@citrix.com> |
xen/blkback: persistent-grants fixes This patch contains fixes for persistent grants implementation v2: * handle == 0 is a valid handle, so initialize grants in blkback setting the handle to BLKBACK_INVALID_HANDLE instead of 0. Reported by Konrad Rzeszutek Wilk. * new_map is a boolean, use "true" or "false" instead of 1 and 0. Reported by Konrad Rzeszutek Wilk. * blkfront announces the persistent-grants feature as feature-persistent-grants, use feature-persistent instead which is consistent with blkback and the public Xen headers. * Add a consistency check in blkfront to make sure we don't try to access segments that have not been set. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> [v1: The new_map int->bool had already been changed] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0a8704a5 |
|
24-Oct-2012 |
Roger Pau Monne <roger.pau@citrix.com> |
xen/blkback: Persistent grant maps for xen blk drivers This patch implements persistent grants for the xen-blk{front,back} mechanism. The effect of this change is to reduce the number of unmap operations performed, since they cause a (costly) TLB shootdown. This allows the I/O performance to scale better when a large number of VMs are performing I/O. Previously, the blkfront driver was supplied a bvec[] from the request queue. This was granted to dom0; dom0 performed the I/O and wrote directly into the grant-mapped memory and unmapped it; blkfront then removed foreign access for that grant. The cost of unmapping scales badly with the number of CPUs in Dom0. An experiment showed that when Dom0 has 24 VCPUs, and guests are performing parallel I/O to a ramdisk, the IPIs from performing unmap's is a bottleneck at 5 guests (at which point 650,000 IOPS are being performed in total). If more than 5 guests are used, the performance declines. By 10 guests, only 400,000 IOPS are being performed. This patch improves performance by only unmapping when the connection between blkfront and back is broken. On startup blkfront notifies blkback that it is using persistent grants, and blkback will do the same. If blkback is not capable of persistent mapping, blkfront will still use the same grants, since it is compatible with the previous protocol, and simplifies the code complexity in blkfront. To perform a read, in persistent mode, blkfront uses a separate pool of pages that it maps to dom0. When a request comes in, blkfront transmutes the request so that blkback will write into one of these free pages. Blkback keeps note of which grefs it has already mapped. When a new ring request comes to blkback, it looks to see if it has already mapped that page. If so, it will not map it again. If the page hasn't been previously mapped, it is mapped now, and a record is kept of this mapping. Blkback proceeds as usual. When blkfront is notified that blkback has completed a request, it memcpy's from the shared memory, into the bvec supplied. A record that the {gref, page} tuple is mapped, and not inflight is kept. Writes are similar, except that the memcpy is peformed from the supplied bvecs, into the shared pages, before the request is put onto the ring. Blkback stores a mapping of grefs=>{page mapped to by gref} in a red-black tree. As the grefs are not known apriori, and provide no guarantees on their ordering, we have to perform a search through this tree to find the page, for every gref we receive. This operation takes O(log n) time in the worst case. In blkfront grants are stored using a single linked list. The maximum number of grants that blkback will persistenly map is currently set to RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, to prevent a malicios guest from attempting a DoS, by supplying fresh grefs, causing the Dom0 kernel to map excessively. If a guest is using persistent grants and exceeds the maximum number of grants to map persistenly the newly passed grefs will be mapped and unmaped. Using this approach, we can have requests that mix persistent and non-persistent grants, and we need to handle them correctly. This allows us to set the maximum number of persistent grants to a lower value than RING_SIZE * BLKIF_MAX_SEGMENTS_PER_REQUEST, although setting it will lead to unpredictable performance. In writing this patch, the question arrises as to if the additional cost of performing memcpys in the guest (to/from the pool of granted pages) outweigh the gains of not performing TLB shootdowns. The answer to that question is `no'. There appears to be very little, if any additional cost to the guest of using persistent grants. There is perhaps a small saving, from the reduced number of hypercalls performed in granting, and ending foreign access. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up the misuse of bool as int]
|
#
2fc136ee |
|
11-Sep-2012 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/m2p: do not reuse kmap_op->dev_bus_addr If the caller passes a valid kmap_op to m2p_add_override, we use kmap_op->dev_bus_addr to store the original mfn, but dev_bus_addr is part of the interface with Xen and if we are batching the hypercalls it might not have been written by the hypervisor yet. That means that later on Xen will write to it and we'll think that the original mfn is actually what Xen has written to it. Rather than "stealing" struct members from kmap_op, keep using page->index to store the original mfn and add another parameter to m2p_remove_override to get the corresponding kmap_op instead. It is now responsibility of the caller to keep track of which kmap_op corresponds to a particular page in the m2p_override (gntdev, the only user of this interface that passes a valid kmap_op, is already doing that). CC: stable@kernel.org Reported-and-Tested-By: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
e79affc3 |
|
08-Aug-2012 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen/arm: compile blkfront and blkback Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4dae7670 |
|
13-Mar-2012 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Squash the discard support for 'file' and 'phy' type. The only reason for the distinction was for the special case of 'file' (which is assumed to be loopback device), was to reach inside the loopback device, find the underlaying file, and call fallocate on it. Fortunately "xen-blkback: convert hole punching to discard request on loop devices" removes that use-case and we now based the discard support based on blk_queue_discard(q) and extract all appropriate parameters from the 'struct request_queue'. CC: Li Dongyang <lidongyang@novell.com> Acked-by: Jan Beulich <JBeulich@suse.com> [v1: Dropping pointless initializer and keeping blank line] [v2: Remove the kfree as it is not used anymore] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b2167ba6 |
|
28-Nov-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/blkback: Enable blkback on HVM guests Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
4f14faaa |
|
28-Nov-2011 |
Daniel De Graaf <dgdegra@tycho.nsa.gov> |
xen/blkback: use grant-table.c hypercall wrappers Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ae18be11 |
|
10-Nov-2011 |
Li Dongyang <lidongyang@novell.com> |
xen-blkback: convert hole punching to discard request on loop devices As of dfaa2ef68e80c378e610e3c8c536f1c239e8d3ef, loop devices support discard request now. We could just issue a discard request, and the loop driver will punch the hole for us, so we don't need to touch the internals of loop device and punch the hole ourselves, Thanks. V0->V1: rebased on devel/for-jens-3.3 Signed-off-by: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
42146352 |
|
12-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Move processing of BLKIF_OP_DISCARD from dispatch_rw_block_io .. and move it to its own function that will deal with the discard operation. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5ea42986 |
|
12-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blk[front|back]: Enhance discard support with secure erasing support. Part of the blkdev_issue_discard(xx) operation is that it can also issue a secure discard operation that will permanantly remove the sectors in question. We advertise that we can support that via the 'discard-secure' attribute and on the request, if the 'secure' bit is set, we will attempt to pass in REQ_DISCARD | REQ_SECURE. CC: Li Dongyang <lidongyang@novell.com> [v1: Used 'flag' instead of 'secure:1' bit] [v2: Use 'reserved' uint8_t instead of adding a new value] [v3: Check for nseg when mapping instead of operation] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
97e36834 |
|
11-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blk[front|back]: Squash blkif_request_rw and blkif_request_discard together In a union type structure to deal with the overlapping attributes in a easier manner. Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
6927d920 |
|
17-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Fix two races in the handling of barrier requests. There are two windows of opportunity to cause a race when processing a barrier request. This patch fixes this. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
dda18528 |
|
13-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Check for proper operation. The patch titled: "xen/blkback: Fix the inhibition to map pages when discarding sector ranges." had the right idea except that it used the wrong comparison operator. It had == instead of !=. This fixes the bug where all (except discard) operations would have been ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
64391b25 |
|
09-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Fix the inhibition to map pages when discarding sector ranges. The 'operation' parameters are the ones provided to the bio layer while the req->operation are the ones passed in between the backend and frontend. We used the wrong 'operation' value to squash the call to map pages when processing the discard operation resulting in an hypercall that did nothing. Lets guard against going in the mapping function by checking for the proper operation type. CC: Li Dongyang <lidongyang@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
5c62cb48 |
|
09-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Report VBD_WSECT (wr_sect) properly. We did not increment the amount of sectors written to disk b/c we tested for the == WRITE which is incorrect - as the operations are more of WRITE_FLUSH, WRITE_ODIRECT. This patch fixes it by doing a & WRITE check. CC: stable@kernel.org Reported-by: Andy Burns <xen.lists@burns.me.uk> Suggested-by: Ian Campbell <Ian.Campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
29bde093 |
|
09-Oct-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests. We emulate the barrier requests by draining the outstanding bio's and then sending the WRITE_FLUSH command. To drain the I/Os we use the refcnt that is used during disconnect to wait for all the I/Os before disconnecting from the frontend. We latch on its value and if it reaches either the threshold for disconnect or when there are no more outstanding I/Os, then we have drained all I/Os. Suggested-by: Christopher Hellwig <hch@infradead.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8e6dc6fe |
|
16-Sep-2011 |
Jan Beulich <JBeulich@suse.com> |
xen-blkback: use kzalloc() in favor of kmalloc()+memset() This fixes the problem of three of those four memset()-s having improper size arguments passed: Sizeof a pointer-typed expression returns the size of the pointer, not that of the pointed to data. It also reverts using kmalloc() instead of kzalloc() for the allocation of the pending grant handles array, as that array gets fully initialized in a subsequent loop. Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b3cb0d6a |
|
01-Sep-2011 |
Li Dongyang <lidongyang@novell.com> |
xen-blkback: Implement discard requests ('feature-discard') ..aka ATA TRIM/SCSI UNMAP command to be passed through the frontend and used as appropiately by the backend. We also advertise certain granulity parameters to the frontend so it can plug them in. If the backend is a realy device - we just end up using 'blkdev_issue_discard' while for loopback devices - we just punch a hole in the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> [v1: Fixed up pr_debug and commit description] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
0930bba6 |
|
29-Sep-2011 |
Stefano Stabellini <stefano.stabellini@eu.citrix.com> |
xen: modify kernel mappings corresponding to granted pages If we want to use granted pages for AIO, changing the mappings of a user vma and the corresponding p2m is not enough, we also need to update the kernel mappings accordingly. Currently this is only needed for pages that are created for user usages through /dev/xen/gntdev. As in, pages that have been in use by the kernel and use the P2M will not need this special mapping. However there are no guarantees that in the future the kernel won't start accessing pages through the 1:1 even for internal usage. In order to avoid the complexity of dealing with highmem, we allocated the pages lowmem. We issue a HYPERVISOR_grant_table_op right away in m2p_add_override and we remove the mappings using another HYPERVISOR_grant_table_op in m2p_remove_override. Considering that m2p_add_override and m2p_remove_override are called once per page we use multicalls and hypercall batching. Use the kmap_op pointer directly as argument to do the mapping as it is guaranteed to be present up until the unmapping is done. Before issuing any unmapping multicalls, we need to make sure that the mapping has already being done, because we need the kmap->handle to be set correctly. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> [v1: Removed GRANT_FRAME_BIT usage] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a7e9357f |
|
29-Jun-2011 |
Bastian Blank <waldi@debian.org> |
xen/blkback: Add module alias for autoloading Add xen-backend:vbd module alias to the xen-blkback module. This allows automatic loading of the module. Signed-off-by: Bastian Blank <waldi@debian.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b4726a9d |
|
28-May-2011 |
Daniel Stodden <daniel.stodden@citrix.com> |
xen/blkback: Don't let in-flight requests defer pending ones. Running RING_FINAL_CHECK_FOR_REQUESTS from make_response is a bad idea. It means that in-flight I/O is essentially blocking continued batches. This essentially kills throughput on frontends which unplug (or even just notify) early and rightfully assume addtional requests will be picked up on time, not synchronously. Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com> [v1: Rebased and fixed compile problems] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
9b83c771 |
|
27-May-2011 |
Dan Carpenter <error27@gmail.com> |
xen/blkback: potential null dereference in error handling blkbk->pending_pages can be NULL here so I added a check for it. Signed-off-by: Dan Carpenter <error27@gmail.com> [v1: Redid the loop a bit] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8ab52150 |
|
17-May-2011 |
Jan Beulich <JBeulich@novell.com> |
xen/blkback: don't fail empty barrier requests The sector number on empty barrier requests may (will?) be -1, which, given that it's being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk's size. Inspired by Konrad's "When writting barriers set the sector number to zero...". While at it also add overflow checking to the math in vbd_translate(). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
496b318e |
|
13-May-2011 |
Laszlo Ersek <lersek@redhat.com> |
xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end() vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double xenbus_transaction_end() calls. The next down_read() in xenbus_transaction_start() (at eg. the next resize attempt) hangs. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317 Acked-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
cca537af |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: if log_stats is enabled print out the data. And not depend on the driver being built with -DDEBUG flag. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
3d814731 |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Prefix 'vbd' with 'xen' in structs and functions. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
30fd1502 |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Change structure name blkif_st to xen_blkif. No need for that '_st' and xen_blkif is more apt. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
b0f80127 |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Fixing some more of the cleanpatch.pl warnings. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
03e0edf9 |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Checkpatch.pl recommend against multiple assigments. CHECK: multiple assignments should be avoided Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
72468bfc |
|
11-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Removing the debug_lvl option. It is not really used for anything. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
22b20f2d |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Use the DRV_PFX in the pr_.. macros. To make it easier to read. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ebe81906 |
|
12-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Change printk/DPRINTK to pr_.. type variant. And also make them uniform and prefix the message with 'xen-blkback'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
01f37f2d |
|
11-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Fixed up comments and converted spaces to tabs. Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
3d68b399 |
|
05-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Fix up some of the comments. They had the wrong data or were in the wrong spot. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
fc53bf75 |
|
05-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Squash the checking for operation into dispatch_rw_block_io We do a check for the operations right before calling dispatch_rw_block_io. And then we do the same check in dispatch_rw_block_io. This patch squashes those checks into the 'dispatch_rw_block_io' function. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
24f567f9 |
|
04-May-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop BLKIF_OP_WRITE_BARRIER. We drop the support for 'feature-barrier' and add in the support for the 'feature-flush-cache' if the real backend storage supports flushing. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
a19be5f0 |
|
26-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
Revert "xen/blkback: Move the plugging/unplugging to a higher level." This reverts commit 97961ef46b9b5a6a7c918a38b898a7b3e49869f4 b/c we lose about 15% performance if we do the unplugging and the end of the reading the ring buffer.
|
#
013c3ca1 |
|
26-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler. If one runs a simple fio request with random read/write with a 20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler. IOmeter | | | | 64K, randrw | NOOP | CFQ | deadline | randrwmix=80 | | | | --------------+-------+------+----------+ blkback |103/27 |32/10 | 102/27 | --------------+-------+------+----------+ QEMU qdisk |103/27 |102/27| 102/27 | The problem as explained by Vivek Goyal was: ".. that difference is that sync vs async requests. In the case of a kernel thread submitting IO, [..] all the WRITES might be being considered as async and will go in a different queue. If you mix those with some READS, they are always sync and will go in differnet queue. In presence of sync queue, CFQ will idle and choke up WRITES in an attempt to improve latencies of READs. In case of AIO [note: this is what QEMU qdisk is doing] , [..] it is direct IO and both READS and WRITES will be considered SYNC and will go in a single queue and no choking of WRITES will take place." The solution is quite simple, tack on REQ_SYNC (which is what the WRITE_ODIRECT macro points to) and the numbers go back up. Suggested-by: Vivek Goyal <vgoyal@redhat.com Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
97961ef4 |
|
25-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Move the plugging/unplugging to a higher level. We used to the plug/unplug on the submit_bio. But that means if within a stream of WRITE, WRITE, WRITE,...,WRITE we have one READ, it could stall the pipeline (as the 'submio_bio' could trigger the unplug_fnc to be called and stall/sync when doing the READ). Instead we want to move the unplugging when the whole (or as a much as possible) ring buffer has been processed. This also eliminates us doing plug/unplug for each request. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
8b6bf747 |
|
20-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Prefix exposed functions with xen_ And also shorten the name if it has blkback to blkbk. This results in the symbol table (if compiled in the kernel) to be much shorter, prettier, and also easier to search for. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
42c7841d |
|
20-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen-blkback: Inline some of the functions that were moved from vbd/interface.c Shuffling code around. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
ee9ff853 |
|
20-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly. Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the critical bits where they belong, respectively. Leaving only blkback.c for the data- and xenbus.c for the control path. Suggested-by: Daniel Stodden <daniel.stodden@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
#
dfc07b13 |
|
18-Apr-2011 |
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> |
xen/blkback: Move it from drivers/xen to drivers/block .. and modify the Makefile and Kconfig files appropriately. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|