History log of /linux-master/drivers/block/drbd/drbd_req.c
Revision Date Author Comments
# 0d11f3cf 29-Mar-2023 Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

drbd: Pass a peer device to the resync and online verify functions

Originally-from: Andreas Grünbacher <agruen@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ad878a0d 29-Mar-2023 Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

drbd: pass drbd_peer_device to __req_mod

In preparation to support multiple connections, we need to know which
one we need to modify the request state for.

Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20230330102744.2128122-2-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 613b1488 04-Jan-2023 Jens Axboe <axboe@kernel.dk>

block: handle bio_split_to_limits() NULL return

This can't happen right now, but in preparation for allowing
bio_split_to_limits() returning NULL if it ended the bio, check for it
in all the callers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e3fa02d7 30-Nov-2022 Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

drbd: introduce drbd_ratelimit()

Use call site specific ratelimit instead of one single static global.
Also ratelimit ASSERTION messages generated by expect().

Originally-from: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221201110349.1282687-5-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 93c68cc4 22-Nov-2022 Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

drbd: use consistent license

DRBD currently has a mix of GPL-2.0 and GPL-2.0-or-later SPDX license
identifiers. We have decided to stick with GPL 2.0 only, so consistently
use that identifier.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20221122134301.69258-5-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6d42ddf7 20-Oct-2022 Christoph Böhmwalder <christoph.boehmwalder@linbit.com>

drbd: only clone bio if we have a backing device

Commit c347a787e34cb (drbd: set ->bi_bdev in drbd_req_new) moved a
bio_set_dev call (which has since been removed) to "earlier", from
drbd_request_prepare to drbd_req_new.

The problem is that this accesses device->ldev->backing_bdev, which is
not NULL-checked at this point. When we don't have an ldev (i.e. when
the DRBD device is diskless), this leads to a null pointer deref.

So, only allocate the private_bio if we actually have a disk. This is
also a small optimization, since we don't clone the bio to only to
immediately free it again in the diskless case.

Fixes: c347a787e34cb ("drbd: set ->bi_bdev in drbd_req_new")
Co-developed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Co-developed-by: Joel Colledge <joel.colledge@linbit.com>
Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221020085205.129090-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 5a97806f 26-Jul-2022 Christoph Hellwig <hch@lst.de>

block: change the blk_queue_split calling convention

The double indirect bio leads to somewhat suboptimal code generation.
Instead return the (original or split) bio, and make sure the
request_queue arguments to the lower level helpers is passed after the
bio to avoid constant reshuffling of the argument passing registers.

Also give it and the helpers used to implement it more descriptive names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 1b70ccec 12-Jul-2022 Christoph Hellwig <hch@lst.de>

drbd: stop using bdevname in drbd_report_io_error

Just use the %pg format specifier instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220713055317.1888500-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8fd6533e 06-Apr-2022 Haowen Bai <baihaowen@meizu.com>

drbd: Return true/false (not 1/0) from bool functions

Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions. This fixes the following warnings from coccicheck:

./drivers/block/drbd/drbd_req.c:912:9-10: WARNING: return of 0/1 in
function 'remote_due_to_read_balancing' with return type bool

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-8-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2651ee5a 31-Mar-2022 Jakob Koschel <jakobkoschel@gmail.com>

drbd: remove check of list iterator against head past the loop body

When list_for_each_entry() completes the iteration over the whole list
without breaking the loop, the iterator value will be a bogus pointer
computed based on the head element.

While it is safe to use the pointer to determine if it was computed
based on the head element, either with list_entry_is_head() or
&pos->member == head, using the iterator variable after the loop should
be avoided.

In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220331220349.885126-2-jakobkoschel@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f4329d1f 30-Mar-2022 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential silent data corruption

Scenario:
---------

bio chain generated by blk_queue_split().
Some split bio fails and propagates its error status to the "parent" bio.
But then the (last part of the) parent bio itself completes without error.

We would clobber the already recorded error status with BLK_STS_OK,
causing silent data corruption.

Reproducer:
-----------

How to trigger this in the real world within seconds:

DRBD on top of degraded parity raid,
small stripe_cache_size, large read_ahead setting.
Drop page cache (sysctl vm.drop_caches=1, fadvise "DONTNEED",
umount and mount again, "reboot").

Cause significant read ahead.

Large read ahead request is split by blk_queue_split().
Parts of the read ahead that are already in the stripe cache,
or find an available stripe cache to use, can be serviced.
Parts of the read ahead that would need "too much work",
would need to wait for a "stripe_head" to become available,
are rejected immediately.

For larger read ahead requests that are split in many pieces, it is very
likely that some "splits" will be serviced, but then the stripe cache is
exhausted/busy, and the remaining ones will be rejected.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: <stable@vger.kernel.org> # 4.13.x
Link: https://lore.kernel.org/r/20220330185551.3553196-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a34592ff 09-Feb-2022 Christoph Hellwig <hch@lst.de>

scsi: drbd: Remove WRITE_SAME support

REQ_OP_WRITE_SAME was only ever submitted by the legacy Linux zeroing code,
which has switched to use REQ_OP_WRITE_ZEROES long ago.

Link: https://lore.kernel.org/r/20220209082828.2629273-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>


# b9b1335e 22-Mar-2022 NeilBrown <neilb@suse.de>

remove bdi_congested() and wb_congested() and related functions

These functions are no longer useful as no BDIs report congestions any
more.

Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.

So replace the calls by 'false' and simplify the code - and remove the
functions.

[akpm@linux-foundation.org: fix build]

Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# abfc426d 02-Feb-2022 Christoph Hellwig <hch@lst.de>

block: pass a block_device to bio_clone_fast

Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c347a787 02-Feb-2022 Christoph Hellwig <hch@lst.de>

drbd: set ->bi_bdev in drbd_req_new

Make sure the newly allocated bio has the correct bi_bdev set from the
start.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 3e08773c 12-Oct-2021 Christoph Hellwig <hch@lst.de>

block: switch polling to be bio based

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

- the cookie construction can made entirely private in blk-mq.c
- the caller does not need to remember the request_queue and cookie
separately and thus sidesteps their lifetime issues
- keeping the device and the cookie inside the bio allows to trivially
support polling BIOs remapping by stacking drivers
- a lot of code to propagate the cookie back up the submission path can
be removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# edb0872f 09-Aug-2021 Christoph Hellwig <hch@lst.de>

block: move the bdi from the request_queue to the gendisk

The backing device information only makes sense for file system I/O,
and thus belongs into the gendisk and not the lower level request_queue
structure. Move it there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20210809141744.1203023-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6327c911 20-Apr-2021 Gustavo A. R. Silva <gustavo@embeddedor.com>

drbd: Fix fall-through warnings for Clang

In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding a break statement instead of just
letting the code fall through to the next, and by adding a fallthrough
pseudo-keyword in places whre the code is intended to fall through.

Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ae7153f1 26-Jan-2021 Christoph Hellwig <hch@lst.de>

drbd: remove drbd_req_make_private_bio

Open code drbd_req_make_private_bio in the two callers to prepare
for further changes. Also don't bother to initialize bi_next as the
bio code already does that that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 370276ba 21-Jan-2021 Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

drbd: remove unused argument from drbd_request_prepare and __drbd_make_request

We can remove start_jif since it is not used by drbd_request_prepare,
then remove it from __drbd_make_request further.

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: drbd-dev@lists.linbit.com
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 309dca30 24-Jan-2021 Christoph Hellwig <hch@lst.de>

block: store a block_device pointer in struct bio

Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device. From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 155bd9d1 25-Sep-2020 Christoph Hellwig <hch@lst.de>

drbd: remove ->this_bdev

DRBD keeps a block device open just to get and set the capacity from
it. Switch to primarily using the disk capacity as intended by the
block layer, and sync it to the bdev using revalidate_disk_size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# ed00aabd 01-Jul-2020 Christoph Hellwig <hch@lst.de>

block: rename generic_make_request to submit_bio_noacct

generic_make_request has always been very confusingly misnamed, so rename
it to submit_bio_noacct to make it clear that it is submit_bio minus
accounting and a few checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c62b37d9 01-Jul-2020 Christoph Hellwig <hch@lst.de>

block: move ->make_request_fn to struct block_device_operations

The make_request_fn is a little weird in that it sits directly in
struct request_queue instead of an operation vector. Replace it with
a block_device_operations method called submit_bio (which describes much
better what it does). Also remove the request_queue argument to it, as
the queue can be derived pretty trivially from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# f695ca38 01-Jul-2020 Christoph Hellwig <hch@lst.de>

block: remove the request_queue argument from blk_queue_split

The queue can be trivially derived from the bio, so pass one less
argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 84bc83c3 01-Jul-2020 Christoph Hellwig <hch@lst.de>

drbd: stop using ->queuedata

Instead of setting up the queuedata as well just use one private data
field.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 24d69293 26-May-2020 Christoph Hellwig <hch@lst.de>

drbd: use bio_{start,end}_io_acct

Switch drbd to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ec45a263 27-Nov-2019 zhengbin <zhengbin13@huawei.com>

drbd: Remove unneeded semicolon

Fixes coccicheck warning:

drivers/block/drbd/drbd_req.c:887:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c6ae4c04 22-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 91

Based on 1 normalized pattern(s):

is free software you can redistribute it and or modify it under the
terms of the gnu general public license as published by the free
software foundation either version 2 or at your option any later
version [drbd] is distributed in the hope that it will be useful but
without any warranty without even the implied warranty of
merchantability or fitness for a particular purpose see the gnu
general public license for more details you should have received a
copy of the gnu general public license along with [drbd] see the
file copying if not write to the free software foundation 675 mass
ave cambridge ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 16 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520075212.050796421@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# e16fb3a8 22-Jan-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

block: Mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_int.h:1774:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_receiver.c:3093:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_receiver.c:3120:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/block/drbd/drbd_req.c:856:6: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>


# f31e583a 20-Dec-2018 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")

And also re-enable partial-zero-out + discard aligned.

With the introduction of REQ_OP_WRITE_ZEROES,
we started to use that for both WRITE_ZEROES and DISCARDS,
hoping that WRITE_ZEROES would "do what we want",
UNMAP if possible, zero-out the rest.

The example scenario is some LVM "thin" backend.

While an un-allocated block on dm-thin reads as zeroes, on a dm-thin
with "skip_block_zeroing=true", after a partial block write allocated
that block, that same block may well map "undefined old garbage" from
the backends on LBAs that have not yet been written to.

If we cannot distinguish between zero-out and discard on the receiving
side, to avoid "undefined old garbage" to pop up randomly at later times
on supposedly zero-initialized blocks, we'd need to map all discards to
zero-out on the receiving side. But that would potentially do a full
alloc on thinly provisioned backends, even when the expectation was to
unmap/trim/discard/de-allocate.

We need to distinguish on the protocol level, whether we need to guarantee
zeroes (and thus use zero-out, potentially doing the mentioned full-alloc),
or if we want to put the emphasis on discard, and only do a "best effort
zeroing" (by "discarding" blocks aligned to discard-granularity, and zeroing
only potential unaligned head and tail clippings to at least *try* to
avoid "false positives" in an online-verify later), hoping that someone
set skip_block_zeroing=false.

For some discussion regarding this on dm-devel, see also
https://www.mail-archive.com/dm-devel%40redhat.com/msg07965.html
https://www.redhat.com/archives/dm-devel/2018-January/msg00271.html

For backward compatibility, P_TRIM means zero-out, unless the
DRBD_FF_WZEROES feature flag is agreed upon during handshake.

To have upper layers even try to submit WRITE ZEROES requests,
we need to announce "efficient zeroout" independently.

We need to fixup max_write_zeroes_sectors after blk_queue_stack_limits():
if we can handle "zeroes" efficiently on the protocol,
we want to do that, even if our backend does not announce
max_write_zeroes_sectors itself.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9305455a 03-Oct-2018 Bart Van Assche <bvanassche@acm.org>

block: Finish renaming REQ_DISCARD into REQ_OP_DISCARD

Some time ago REQ_DISCARD was renamed into REQ_OP_DISCARD. Some comments
and documentation files were not updated however. Update these comments
and documentation files. See also commit 4e1b2d52a80d ("block, fs,
drivers: remove REQ_OP compat defs and related code").

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# ddcf35d3 18-Jul-2018 Michael Callahan <michaelcallahan@fb.com>

block: Add and use op_stat_group() for indexing disk_stat fields.

Add and use a new op_stat_group() function for indexing partition stat
fields rather than indexing them by rq_data_dir() or bio_data_dir().
This function works similarly to op_is_sync() in that it takes the
request::cmd_flags or bio::bi_opf flags and determines which stats
should et updated.

In addition, the second parameter to generic_start_io_acct() and
generic_end_io_acct() is now a REQ_OP rather than simply a read or
write bit and it uses op_stat_group() on the parameter to determine
the stat group.

Note that the partition in_flight counts are not part of the per-cpu
statistics and as such are not indexed via this function. It's now
indexed by op_is_write().

tj: Refreshed on top of v4.17. Updated to pass around REQ_OP.

Signed-off-by: Michael Callahan <michaelcallahan@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# fad2d4ef 25-Jun-2018 Bart Van Assche <bvanassche@acm.org>

drbd: Fix drbd_request_prepare() discard handling

Fix the test that verifies whether bio_op(bio) represents a discard
or write zeroes operation. Compile-tested only.

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Fixes: 7435e9018f91 ("drbd: zero-out partial unaligned discards on local backend")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0892fac8 20-May-2018 Kent Overstreet <kent.overstreet@gmail.com>

drbd: convert to bioset_init()/mempool_init()

Convert drbd to embedded bio sets and mempools.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2bccef39 17-Oct-2017 Kees Cook <keescook@chromium.org>

drbd: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: drbd-dev@lists.linbit.com
Signed-off-by: Kees Cook <keescook@chromium.org>


# de6978be 29-Aug-2017 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: add explicit plugging when submitting batches

When submitting batches of requests which had been queued on the
submitter thread, typically because they needed to wait for an
activity log transactions, use explicit plugging to help potential
merging of requests in the backend io-scheduler.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9da10e8d 29-Aug-2017 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: change list_for_each_safe to while(list_first_entry_or_null)

Two instances of list_for_each_safe can drop their tmp element, they
really just peel off each element in turn from the start of the list.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# c51a0ef3 29-Aug-2017 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: introduce drbd_recv_header_maybe_unplug

Recently, drbd_recv_header() was changed to potentially
implicitly "unplug" the backend device(s), in case there
is currently nothing to receive.

Be more explicit about it: re-introduce the original drbd_recv_header(),
and introduce a new drbd_recv_header_maybe_unplug() for use by the
receiver "main loop".

Using explicit plugging via blk_start_plug(); blk_finish_plug();
really helps the io-scheduler of the backend with merging requests.

Wrap the receiver "main loop" with such a plug.
Also catch unplug events on the Primary,
and try to propagate.

This is performance relevant. Without this, if the receiving side does
not merge requests, number of IOPS on the peer can me significantly
higher than IOPS on the Primary, and can easily become the bottleneck.

Together, both changes should help to reduce the number of IOPS
as seen on the backend of the receiving side, by increasing
the chance of merging mergable requests, without trading latency
for more throughput.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 74d46992 23-Aug-2017 Christoph Hellwig <hch@lst.de>

block: replace bi_bdev with a gendisk pointer and partitions index

This way we don't need a block_device structure to submit I/O. The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open. Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device. But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d62e26b3 30-Jun-2017 Jens Axboe <axboe@kernel.dk>

block: pass in queue to inflight accounting

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# af67c31f 17-Jun-2017 NeilBrown <neilb@suse.com>

blk: remove bio_set arg from blk_queue_split()

blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.

Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.

This is inconsistent and unnecessary. Remove the last arg and always use
q->bio_split inside blk_queue_split()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4e4cbee9 03-Jun-2017 Christoph Hellwig <hch@lst.de>

block: switch bios to blk_status_t

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>


# a00ebd1c 11-May-2017 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()

When killing kref_sub(), the unconditional additional kref_get()
was not properly paired with the necessary kref_put(), causing
a leak of struct drbd_requests (~ 224 Bytes) per submitted bio,
and breaking DRBD in general, as the destructor of those "drbd_requests"
does more than just the mempoll_free().

Fixes: bdfafc4ffdd2 ("locking/atomic, kref: Kill kref_sub()")
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: stable@vger.kernel.org # v4.11
Signed-off-by: Jens Axboe <axboe@fb.com>


# 45c21793 05-Apr-2017 Christoph Hellwig <hch@lst.de>

drbd: implement REQ_OP_WRITE_ZEROES

It seems like DRBD assumes its on the wire TRIM request always zeroes data.
Use that fact to implement REQ_OP_WRITE_ZEROES.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 0dbed96a 05-Apr-2017 Christoph Hellwig <hch@lst.de>

drbd: make intelligent use of blkdev_issue_zeroout

drbd always wants its discard wire operations to zero the blocks, so
use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
reinventing it poorly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# dc3b17cc 02-Feb-2017 Jan Kara <jack@suse.cz>

block: Use pointer to backing_dev_info from request_queue

We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>


# bdfafc4f 14-Nov-2016 Peter Zijlstra <peterz@infradead.org>

locking/atomic, kref: Kill kref_sub()

By general sentiment kref_sub() is a bad interface, make it go away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 2c935bc5 14-Nov-2016 Peter Zijlstra <peterz@infradead.org>

locking/atomic, kref: Add kref_read()

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

atomic_read(&kref->refcount)
kref->refcount.counter

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 1eff9d32 05-Aug-2016 Jens Axboe <axboe@fb.com>

block: rename bio bi_rw to bi_opf

Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe <axboe@fb.com>


# 70246286 19-Jul-2016 Christoph Hellwig <hch@lst.de>

block: get rid of bio_rw and READA

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7e5fec31 13-Jun-2016 Fabian Frederick <fabf@skynet.be>

drbd: code cleanups without semantic changes

This contains various cosmetic fixes ranging from simple typos to
const-ifying, and using booleans properly.

Original commit messages from Fabian's patch set:
drbd: debugfs: constify drbd_version_fops
drbd: use seq_put instead of seq_print where possible
drbd: include linux/uaccess.h instead of asm/uaccess.h
drbd: use const char * const for drbd strings
drbd: kerneldoc warning fix in w_e_end_data_req()
drbd: use unsigned for one bit fields
drbd: use bool for peer is_ states
drbd: fix typo
drbd: use | for bitmask combination
drbd: use true/false for bool
drbd: fix drbd_bm_init() comments
drbd: introduce peer state union
drbd: fix maybe_pull_ahead() locking comments
drbd: use bool for growing
drbd: remove redundant declarations
drbd: replace if/BUG by BUG_ON

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 9104d31a 13-Jun-2016 Lars Ellenberg <lars@linbit.com>

drbd: introduce WRITE_SAME support

We will support WRITE_SAME, if
* all peers support WRITE_SAME (both in kernel and DRBD version),
* all peer devices support WRITE_SAME
* logical_block_size is identical on all peers.

We may at some point introduce a fallback on the receiving side
for devices/kernels that do not support WRITE_SAME,
by open-coding a submit loop. But not yet.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 0ead5cca 13-Jun-2016 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: if there is no good data accessible, writes should be IO errors

If DRBD lost all path to good data,
and the on-no-data-accessible policy is OND_SUSPEND_IO,
all pending and new IO requests are suspended (will block).

If that setting is OND_IO_ERROR, IO will still be completed.
READ to "clean" areas (e.g. on an D_INCONSISTENT device,
and bitmap indicates a block is already in sync) will succeed.
READ to "unclean" areas (bitmap indicates block is out-of-sync),
will return EIO.

If we are already D_DISKLESS (or D_FAILED), we also return EIO.

Unfortunately, on a former R_PRIMARY C_SYNC_TARGET D_INCONSISTENT,
after replication link loss, new WRITE requests still went through OK.

The would also set the "out-of-sync" bit on their way, so READ after
WRITE would still return EIO. Also, the data generation UUIDs had not
been bumped, we would cause data divergence, without being able to
detect it on the next sync handshake, given the right sequence of events
in a multiple error scenario and "improper" order of recovery actions.

The right thing to do is to return EIO for all new writes,
unless we have access to good, current, D_UP_TO_DATE data.

The "established best practices" way to avoid these situations in the
first place is to set OND_SUSPEND_IO, or even do a hard-reset from
the pri-on-incon-degr policy helper hook.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 7435e901 13-Jun-2016 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: zero-out partial unaligned discards on local backend

For consistency, also zero-out partial unaligned chunks of discard
requests on the local backend.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 1b228c98 13-Jun-2016 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix regression: protocol A sometimes synchronous, C sometimes double-latency

Regression introduced with 8.4.5
drbd: application writes may set-in-sync in protocol != C

Overwriting the same block (LBA) while a former version is still
"in-flight" to the peer (to be exact: we did not receive the
P_BARRIER_ACK for its epoch yet) would wait for the full epoch of that
former version to be acknowledged by the peer.

In synchronous and quasi-synchronous protocols C and B,
this may double the latency on overwrites.

With protocol A, which is supposed to be asynchronous and only wait for
local completion, it is even worse: it would make overwrites
quasi-synchronous, they would be hit by the full RTT, which protocol A
was specifically meant to avoid, and possibly the additional time it
takes to drain the buffers first.

Particularly bad for databases, or anything else that
does frequent updates to the same blocks (various file system meta data).

No impact if >= rtt passes between updates to the same block.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 28a8f0d3 05-Jun-2016 Mike Christie <mchristi@redhat.com>

block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f85d9f2d 18-May-2015 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix "endless" transfer log walk in protocol A

Don't remember a DRBD request as ack_pending, if it is not.

In protocol A, we usually clear RQ_NET_PENDING at the same time we set
RQ_NET_SENT, so when deciding to remember it as ack_pending,
mod_rq_state needs to look at the current request state,
not at the previous state before the current modification was applied.

This should prevent advance_conn_req_ack_pending() from walking the full
transfer log just to find NULL in protocol A, which would cause serious
performance degradation with many "in-flight" requests, e.g. when
working via DRBD-proxy, or with a huge bandwidth-delay product.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 668700b4 16-Mar-2015 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Create a dedicated workqueue for sending acks on the control connection

The intention is to reduce CPU utilization. Recent measurements
unveiled that the current performance bottleneck is CPU utilization
on the receiving node. The asender thread became CPU limited.

One of the main points is to eliminate the idr_for_each_entry() loop
from the sending acks code path.

One exception in that is sending back ping_acks. These stay
in the ack-receiver thread. Otherwise the logic becomes too
complicated for no added value.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 84d34f2f 19-Feb-2015 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: improve network timeout detection

Don't blame the peer for being unresponsive,
if we did not even ask the question yet.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 05cbbb39 16-Jan-2015 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: Fix spurious disk-timeout

(You should not use disk-timeout anyways,
see the man page for why...)

We add incoming requests to the tail of some ring list.
On local completion, requests are removed from that list.
The timer looks only at the head of that ring list,
so is supposed to only see the oldest request.
All protected by a spinlock.

The request object is created with timestamps zeroed out.
The timestamp was only filled in just before the actual submit.
But to actually submit the request, we need to give up the spinlock.

If you are unlucky, there is no older still pending request, the timer
looks at a new request with timestamp still zero (before it even was
submitted), and 0 + timeout is most likely older than "now".

Better assign the timestamp right when we put the
request object on said ring list.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2e9ffde6 08-Aug-2014 Andreas Gruenbacher <agruen@linbit.com>

drbd: De-inline drbd_should_do_remote() and drbd_should_send_out_of_sync()

There is no need to have these two as inline functions. In addition,
drbd_should_send_out_of_sync() is only used in a single place, anyway.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# dece1635 05-Nov-2015 Jens Axboe <axboe@fb.com>

block: change ->make_request_fn() and users to return a queue cookie

No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.

Signed-off-by: Jens Axboe <axboe@fb.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>


# 8ae12666 28-Apr-2015 Kent Overstreet <kent.overstreet@gmail.com>

block: kill merge_bvec_fn() completely

As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 54efd50b 23-Apr-2015 Kent Overstreet <kent.overstreet@gmail.com>

block: make generic_make_request handle arbitrarily sized bios

The way the block layer is currently written, it goes to great lengths
to avoid having to split bios; upper layer code (such as bio_add_page())
checks what the underlying device can handle and tries to always create
bios that don't need to be split.

But this approach becomes unwieldy and eventually breaks down with
stacked devices and devices with dynamic limits, and it adds a lot of
complexity. If the block layer could split bios as needed, we could
eliminate a lot of complexity elsewhere - particularly in stacked
drivers. Code that creates bios can then create whatever size bios are
convenient, and more importantly stacked drivers don't have to deal with
both their own bio size limitations and the limitations of the
(potentially multiple) devices underneath them. In the future this will
let us delete merge_bvec_fn and a bunch of other code.

We do this by adding calls to blk_queue_split() to the various
make_request functions that need it - a few can already handle arbitrary
size bios. Note that we add the call _after_ any call to
blk_queue_bounce(); this means that blk_queue_split() and
blk_recalc_rq_segments() don't need to be concerned with bouncing
affecting segment merging.

Some make_request_fn() callbacks were simple enough to audit and verify
they don't need blk_queue_split() calls. The skipped ones are:

* nfhd_make_request (arch/m68k/emu/nfblock.c)
* axon_ram_make_request (arch/powerpc/sysdev/axonram.c)
* simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c)
* brd_make_request (ramdisk - drivers/block/brd.c)
* mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c)
* loop_make_request
* null_queue_bio
* bcache's make_request fns

Some others are almost certainly safe to remove now, but will be left
for future patches.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Jim Paris <jim@jtan.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: skip more mq-based drivers, resolve merge conflicts, etc.]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 4246a0b6 20-Jul-2015 Christoph Hellwig <hch@lst.de>

block: add a bi_error field to struct bio

Currently we have two different ways to signal an I/O error on a BIO:

(1) by clearing the BIO_UPTODATE flag
(2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario. Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 23fe8f8b 24-Mar-2015 David Rientjes <rientjes@google.com>

block, drbd: fix drbd_req_new() initialization

mempool_alloc() does not support __GFP_ZERO since elements may come from
memory that has already been released by mempool_free().

Remove __GFP_ZERO from mempool_alloc() in drbd_req_new() and properly
initialize it to 0.

Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 24480854 23-Nov-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

drbd: use generic io stats accounting functions to simplify io stat accounting

Use generic io stats accounting help functions (generic_{start,end}_io_acct)
to simplify io stat accounting.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 9581f97a 10-Nov-2014 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Fix state change in case of connection timeout

A connection timeout affects all volumes of a resource!
Under the following conditions:

A resource with multiple volumes
AND
ko-count >=1
AND
a write request triggers the timeout (ko-count * timeout)

DRBD's internal state gets confused. That in turn may
lead to very miss leading follow up failures. E.g.
"BUG: scheduling while atomic"

CC: stable@kernel.org # v3.17
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 3b9d35d7 10-Nov-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: merge_bvec_fn: properly remap bvm->bi_bdev

This was not noticed for many years. Affects operation if
md raid is used a backing device for DRBD.

CC: stable@kernel.org # v3.2+
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 8d4ba3f0 11-Sep-2014 Andreas Gruenbacher <agruen@linbit.com>

drbd: Avoid inconsistent locking warning

request_timer_fn() takes resource->req_lock via the device and releases it via
the connection. Avoid this as it is confusing static code checkers.

Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com>
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# f5b90b6b 07-May-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: resync should only lock out specific ranges

During resync, if we need to block some specific incoming write because
of active resync requests to that same range, we potentially caused
*all* new application writes (to "cold" activity log extents) to block
until this one request has been processed.

Improve the do_submit() logic to
* grab all incoming requests to some "incoming" list
* process this list
- move aside requests that are blocked by resync
- prepare activity log transactions,
- commit transactions and submit corresponding requests
- if there are remaining requests that only wait for
activity log extents to become free, stop the fast path
(mark activity log as "starving")
- iterate until no more requests are waiting for the activity log,
but all potentially remaining requests are only blocked by resync
* only then grab new incoming requests

That way, very busy IO on currently "hot" activity log extents cannot
starve scattered IO to "cold" extents. And blocked-by-resync requests
are processed once resync traffic on the affected region has ceased,
without blocking anything else.

The only blocking mode left is when we cannot start requests to "cold"
extents because all currently "hot" extents are actually used.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ad3fee79 20-Dec-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: improve throttling decisions of background resynchronisation

Background resynchronisation does some "side-stepping", or throttles
itself, if it detects application IO activity, and the current resync
rate estimate is above the configured "cmin-rate".

What was not detected: if there is no application IO,
because it blocks on activity log transactions.

Introduce a new atomic_t ap_actlog_cnt, tracking such blocked requests,
and count non-zero as application IO activity.
This counter is exposed at proc_details level 2 and above.

Also make sure to release the currently locked resync extent
if we side-step due to such voluntary throttling.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 7753a4c1 22-Nov-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: add caching oldest request pointers for replication stages

A request that is to be shipped to the peer goes through a few stages:
- queued
- sent, waiting for ack
- ack received, waiting for "barrier ack", which is re-order epoch being
closed on the peer by acknowledging a "cache flush" equivalent
on the lower level device.

In the later two stages, depending on protocol, we may have already
completed this request to the upper layers, so it won't be found anymore
on device->pending_master_completion[] lists.

Track the oldest request yet to be sent (req_next), the oldest not yet
acknowledged (req_ack_pending) and the oldest "still waiting for
something from the peer" (req_not_net_done), doing short list walks on
the transfer log to find the next pending one whenever such a request
makes progress.

Now we have a fast way to look up the oldest requests,
don't do a transfer log walk every time.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 844a6ae7 21-Nov-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: add lists to find oldest pending requests

Adding requests to per-device fifo lists as soon as possible after
allocating them leaves a simple list_first_entry_or_null() to find the
oldest request, regardless what it is still waiting for.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# e5f891b2 21-Nov-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: gather detailed timing statistics for drbd_requests

Record (in jiffies) how much time a request spends in which stages.
Followup commits will use and present this additional timing information
so we can better locate and tackle the root causes of latency spikes,
or present the backlog for asynchronous replication.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0c066bc3 20-Mar-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: short-circuit in maybe_pull_ahead

If we already "pulled ahead", we can short-circuit,
and avoid logging the same messages over and over again.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 08d0dabf 20-Mar-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: application writes may set-in-sync in protocol != C

If "dirty" blocks are written to during resync,
that brings them in-sync.

By explicitly requesting write-acks during resync even in protocol != C,
we now can actually respect this.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 4dd726f0 11-Feb-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: get rid of drbd_queue_work_front

The last user was al_write_transaction, if called with "delegate",
and the last user to call it with "delegate = true" was the receiver
thread, which has no need to delegate, but can call it himself.

Finally drop the delegate parameter, drop the extra
w_al_write_transaction callback, and drop drbd_queue_work_front.

Do not (yet) change dequeue_work_item to dequeue_work_batch, though.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 44a4d551 21-Nov-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: refactor use of first_peer_device()

Reduce the number of calls to first_peer_device(). Instead, call
first_peer_device() just once to assign a local variable peer_device.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 35b5ed5b 03-Dec-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: reduce number of spinlock drop/re-aquire cycles

Instead of dropping and re-aquiring the spinlock around the submit,
just remember that we want to submit, and do that only once we have
dropped the spinlock for good.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 08535466 28-Apr-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: evaluate disk and network timeout on different requests

Just because it is the oldest not yet completed request
does not make it the oldest request waiting for disk.
Or waiting for the peer.

And we completely missed already completed requests
that would still hold references to activity log extents,
waiting only for the barrier ack.

Find two oldest not yet completely processed requests,
one that is still waiting for local completion,
and one that is still waiting for some response from the peer.
These may or may not be the same request object.

Then separately apply the network and disk timeouts, respectively.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# e4d7d6f4 28-Apr-2014 Lars Ellenberg <lars@linbit.com>

drbd: add back some fairness to AL transactions

When batching more updates to the activity log into single transactions,
we lost the ability for new requests to force themselves into the active
set: all preparation steps became non-blocking, and if all currently
hot extents keep busy, they could starve out new incoming requests
to cold extents for quite a while.

This can only happen if your IO backend accepts more IO operations per
average DRBD replication round trip time than you have al-extents
configured.

If we have incoming requests to cold extents,
at least do one blocking update per transaction.

In an artificial worst-case workload on SSD with an asynchronous 600 ms
replication link, with al-extents = 7 (the minimum we allow), and
concurrent full resynch, without this patch, some write requests have
been observed to be starved for 40 seconds.
With this patch, application observed a worst case latency of twice the
replication round trip time.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2f632aeb 28-Apr-2014 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: prepare sending side for REQ_DISCARD

Note that I do NOT call __drbd_chk_io_error for failed REQ_DISCARD.
That may be wrong, though, or needs to differ between EOPNOTSUPP and
other errors...

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 84b8c06b 28-Jul-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Create a dedicated struct drbd_device_work

drbd_device_work is a work item that has a reference to a device,
while drbd_work is a more generic work item that does not carry
a reference to a device.

All callbacks get a pointer to a drbd_work instance, those callbacks
that expect a drbd_device_work use the container_of macro to get it.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# 0500813f 07-Jul-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Move conf_mutex from connection to resource

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# 0b0ba1ef 27-Jun-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Add explicit device parameter to D_ASSERT

The implicit dependency on a variable inside the macro is problematic.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# d0180171 03-Jul-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Remove the terrible DEV hack

DRBD was using dev_err() and similar all over the code; instead of having to
write dev_err(disk_to_dev(device->vdisk), ...) to convert a drbd_device into a
kernel device, a DEV macro was used which implicitly references the device
variable. This is terrible; introduce separate drbd_err() and similar macros
with an explicit device parameter instead.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# a6b32bc3 31-May-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Introduce "peer_device" object between "device" and "connection"

In a setup where a device (aka volume) can replicate to multiple peers and one
connection can be shared between multiple devices, we need separate objects to
represent devices on peer nodes and network connections.

As a first step to introduce multiple connections per device, give each
drbd_device object a single drbd_peer_device object which connects it to a
drbd_connection object.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# bde89a9e 30-May-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Rename drbd_tconn -> drbd_connection

sed -i -e 's:all_tconn:connections:g' -e 's:tconn:connection:g'

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# b30ab791 03-Jul-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Rename "mdev" to "device"

sed -i -e 's:mdev:device:g'

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# 54761697 30-May-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Rename struct drbd_conf -> struct drbd_device

sed -i -e 's:\<drbd_conf\>:drbd_device:g'

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# 01cd2636 19-Dec-2013 Rashika Kheria <rashika.kheria@gmail.com>

drivers: block: Mark functions as static in drbd_req.c

Mark functions drbd_request_prepare() and find_oldest_request() as
static in drbd/drbd_req.c because they are not used outside this file.

This eliminates the following warnings in drbd/drbd_req.c:
drivers/block/drbd/drbd_req.c:1037:1: warning: no previous prototype for ‘drbd_request_prepare’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_req.c:1323:22: warning: no previous prototype for ‘find_oldest_request’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# 4f024f37 11-Oct-2013 Kent Overstreet <kmo@daterainc.com>

block: Abstract out bvec iterator

Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6


# 35f47ef1 23-Oct-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: avoid to shrink max_bio_size due to peer re-configuration

For a long time, the receiving side has spread "too large" incoming
requests over multiple bios. No need to shrink our max_bio_size
(max_hw_sectors) if the peer is reconfigured to use a different storage.

The problem manifests itself if we are not the top of the device stack
(DRBD is used a LVM PV).

A hardware reconfiguration on the peer may cause the supported
max_bio_size to shrink, and the connection handshake would now
unnecessarily shrink the max_bio_size on the active node.

There is no way to notify upper layers that they have to "re-stack"
their limits. So they won't notice at all, and may keep submitting bios
that are suddenly considered "too large for device".

We already check for compatibility and ignore changes on the peer,
the code only was masked out unless we have a fully established connection.
We just need to allow it a bit earlier during the handshake.

Also consider max_hw_sectors in our merge bvec function, just in case.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 607f25e5 27-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix drbd epoch write count for ahead/behind mode

The sanity check when receiving P_BARRIER_ACK does expect all write
requests with a given req->epoch to have been either all replicated,
or all not replicated.

Because req->epoch was assigned before calling maybe_pull_ahead(),
this expectation was not met, leading to an off-by-one in the sanity
check, and further to a "Protocol Error".

Fix: move the call to maybe_pull_ahead() a few lines up,
and assign req->epoch only after that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7074e4a7 27-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: only fail empty flushes if no good data is reachable

We completed empty flushes (blkdev_issue_flush()) with IO error
if we lost the local disk, even if we still have an established
replication link to a healthy remote disk.

Fix this to only report errors to upper layers,
if neither local nor remote data is reachable.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 45ad07b3 19-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: try hard to max out the updates per AL transaction

There may have been more incoming requests while we where preparing
the current transaction. Try to consolidate more updates into this
transaction until we make no more progres.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 7e8c288f 19-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: move start io accounting before activity log transaction

The IO accounting of the drbd "queue depth" was misleading.
We only started IO accounting once we already wrote the activity log.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 08a1ddab 19-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: consolidate as many updates as possible into one AL transaction

Depending on current IO depth, try to consolidate as many updates
as possible into one activity log transaction.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 779b3fe4 19-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: queue writes on submitter thread, unless they pass the activity log fastpath

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 113fef9e 22-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: prepare to queue write requests on a submit worker

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 6d9febe2 19-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: split __drbd_make_request in before and after drbd_al_begin_io

This is in preparation to be able to defer requests that need to wait
for an activity log transaction to a submitter workqueue.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 56392d2f 19-Mar-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: Clarify when activity log I/O is delegated to the worker thread

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 2681f7f6 21-Jan-2013 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential protocol error and resulting disconnect/reconnect

When we notice a disk failure on the receiving side,
we stop sending it new incoming writes.

Depending on exact timing of various events, the same transfer log epoch
could end up containing both replicated (before we noticed the failure)
and local-only requests (after we noticed the failure).

The sanity checks in tl_release(), called when receiving a
P_BARRIER_ACK, check that the ack'ed transfer log epoch matches
the expected epoch, and the number of contained writes matches
the number of ack'ed writes.

In this case, they counted both replicated and local-only writes,
but the peer only acknowledges those it has seen. We get a mismatch,
resulting in a protocol error and disconnect/reconnect cycle.

Messages logged are
"BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n"

A similar issue can also be triggered when starting a resync while
having a healthy replication link, by invalidating one side, forcing a
full sync, or attaching to a diskless node.

Fix this by closing the current epoch if the state changes in a way
that would cause the replication intent of the next write.

Epochs now contain either only non-replicated,
or only replicated writes.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 42839f65 27-Sep-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: log request sector offset and size for IO errors

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# edc9f5eb 27-Sep-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: always write bitmap on detach

If we detach due to local read-error (which sets a bit in the bitmap),
stay Primary, and then re-attach (which re-reads the bitmap from disk),
we potentially lost the "out-of-sync" (or, "bad block") information in
the bitmap.

Always (try to) write out the changed bitmap pages before going diskless.

That way, we don't lose the bit for the bad block,
the next resync will fetch it from the peer, and rewrite
it locally, which may result in block reallocation in some
lower layer (or the hardware), and thereby "heal" the bad blocks.

If the bitmap writeout errors out as well, we will (again: try to)
mark the "we need a full sync" bit in our super block,
if it was a READ error; writes are covered by the activity log already.

If that superblock does not make it to disk either, we are sorry.

Maybe we just lost an entire disk or controller (or iSCSI connection),
and there actually are no bad blocks at all, so we don't need to
re-fetch from the peer, there is no "auto-healing" necessary.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 70f17b6b 03-Sep-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: differentiate early and later "postponing" of requests

We use the RQ_POSTPONED flag to mark a request for several reasons.

It may be a conflicting request in a dual-primary setup,
where conflict detection and resolution on the peer decided that
this request needs to be re-submitted, it needs to re-enter
drbd_make_request() to fix the data divergence caused by these
conflicting, partially overlapping, quasi-simultaneous requests.

In this case we need to mark the corresponding area as out-of-sync,
before we call drbd_al_complete_io().

We also use the RQ_POSTPONED flag to just "push back" a request,
before even processing it, if IO is suspended for some reason.
In this case, as this request was neither submitted nor sent yet,
we must not touch the bitmap.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 76590cd1 29-Aug-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Fix postponed requests

A postponed request might has RQ_IN_ACT_LOG already set, but
is POSTPONED before it gets something in the RQ_LOCAL_MASK
set. Up to now this caused a left-over active extent.

Fix that by only testing for the RQ_IN_ACT_LOG bit in drbd_req_destroy()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# d7644018 28-Aug-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Fix postponed requests

* Postponed requests should not set or clear out-of-sync marks
* When a request gets postponed we need to drop its reference
mdev->local_cnt (put_ldev()).

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 5af2e8ce 14-Aug-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Fix completion of requests while the device is suspended

In various places (E.g. CONNECTION_LOST_WHILE_PENDING) the
RQ_COMPLETION_SUSP mask is passed in the clear set to mod_rq_state().

The issue was that it tried to clear the RQ_COMPLETION_SUSP bit
out of the state mask first, and eventuelly set it afterwards,
in the drbd_req_put_completion_ref() function.

Fixed that by moving the reference getting out of
drbd_req_put_completion_ref() into the mod_rq_state(), before the place
where the extra reference might be put.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# d4dabbe2 31-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: disambiguation, s/P_DISCARD_WRITE/P_SUPERSEDED/

To avoid confusion with REQ_DISCARD aka TRIM, rename our
"discard concurrent write acks" from P_DISCARD_WRITE to P_SUPERSEDED.

At the same time, rename the drbd request event DISCARD_WRITE
to CONFLICT_RESOLVED. It already triggers both successful completion
or restart of the request, depending on our RQ_POSTPONED flag.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 46e21bba 06-Aug-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: NEG_ACK does not imply a barrier-ack

Don't drop a request from the transfer log just because it was NEG_ACKED.
We need it around to be able to verify P_BARRIER_ACKs against the
transver log.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 99b4d8fe 06-Aug-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: only start a new epoch, if the current epoch contains writes

Almost all code paths calling start_new_tl_epoch() guarded it with
if (... current_tle_writes > 0 ... ).
Just move that inside start_new_tl_epoch().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8a0bab2a 07-Aug-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Finish requests that completed while IO was frozen

Requests of an acked epoch are stored on the barrier_acked_requests list. In
case the private bio of such a request completes while IO on the drbd device
is suspended [req_mod(completed_ok)] then the request stays there.

When thawing IO because the fence_peer handler returned, then we use
tl_clear() to apply the connection_lost_while_pending event to all requests
on the transfer-log and the barrier_acked_requests list.

Up to now the connection_lost_while_pending event was not applied
on requests on the barrier_acked_requests list. Fixed that.

I.e. now the connection_lost_while_pending and resend events are
applied to requests on the barrier_acked_requests list. For that
it is necessary that the resend event finishes (local only)
READS correctly.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 519b6d3e 02-Aug-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix drbd wire compatibility for empty flushes

DRBD has a concept of request epochs or reorder-domains,
which are separated on the wire by P_BARRIER packets.

Older DRBD is not able to handle zero-sized requests at all,
so we need to map empty flushes to these drbd barriers.

These are the equivalent of empty flushes, and
by default trigger flushes on the receiving side anyways
(unless not supported or explicitly disabled),
so there is no need to handle this differently in newer drbd either.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 81a3537a 30-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: announce FLUSH/FUA capability to upper layers

In 8.4, we may have bios spanning two activity log extents.
Fixup drbd_al_begin_io() and drbd_al_complete_io() to deal with zero sized bios.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0c849666 30-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: differentiate between normal and forced detach

Aborting local requests (not waiting for completion from the lower level
disk) is dangerous: if the master bio has been completed to upper
layers, data pages may be re-used for other things already.
If local IO is still pending and later completes,
this may cause crashes or corrupt unrelated data.

Only abort local IO if explicitly requested.
Intended use case is a lower level device that turned into a tarpit,
not completing io requests, not even doing error completion.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 3b9ef85e 30-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix null pointer dereference with on-congestion policy when diskless

We must not look at mdev->actlog, unless we have a get_ldev() reference.
It also does not make much sense to try to disconnect or pull-ahead of
the peer, if we don't have good local data.

Only even consider congestion policies, if our local disk is D_UP_TO_DATE.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 9a278a79 24-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: allow read requests to be retried after force-detach

Sometimes, a lower level block device turns into a tar-pit,
not completing requests at all, not even doing error completion.

We can force-detach from such a tar-pit block device,
either by disk-timeout, or by drbdadm detach --force.

Queueing for retry only from the request destruction path (kref hit 0)
makes it impossible to retry affected read requests from the peer,
until the local IO completion happened, as the locally submitted
bio holds a reference on the drbd request object.

If we can only complete READs when the local completion finally
happens, we would not need to force-detach in the first place.

Instead, queue for retry where we otherwise had done the error completion.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 934722a2 24-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: __req_mod: make DISCARD_WRITE and independend case

cherry-picked and adapted from drbd 9 devel branch

This looks cleaner to me,
and also gets rid of the other ugly if-inside-case-fall-through.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a0d856df 24-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: base completion and destruction of requests on ref counts

cherry-picked and adapted from drbd 9 devel branch

The logic for when to get or put a reference is in mod_rq_state().

To not get confused in the freeze/thaw respectively resend/restart
paths, or when cleaning up requests waiting for P_BARRIER_ACK, this
also introduces additional state flags:
RQ_COMPLETION_SUSP, and RQ_EXP_BARR_ACK.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# b406777e 24-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: introduce completion_ref and kref to struct drbd_request

cherry-picked and adapted from drbd 9 devel branch

completion_ref will count pending events necessary for completion.
kref is for destruction.

This only introduces these new members of struct drbd_request,
a followup patch will make actual use of them.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 5df69ece 24-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: __drbd_make_request() is now void

The previous commit causes __drbd_make_request() to always return 0.
Change it to void.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 5da9c836 29-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: better separate WRITE and READ code paths in drbd_make_request

cherry-picked and adapted from drbd 9 devel branch

READs will be interesting to at most one connection,
WRITEs should be interesting for all established connections.

Introduce some helper functions to hopefully make this easier to follow.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# b6dd1a89 28-Nov-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: remove struct drbd_tl_epoch objects (barrier works)

cherry-picked and adapted from drbd 9 devel branch

DRBD requests (struct drbd_request) are already on the per resource
transfer log list, and carry their epoch number. We do not need to
additionally link them on other ring lists in other structs.

The drbd sender thread can recognize itself when to send a P_BARRIER,
by tracking the currently processed epoch, and how many writes
have been processed for that epoch.

If the epoch of the request to be processed does not match the currently
processed epoch, any writes have been processed in it, a P_BARRIER for
this last processed epoch is send out first.
The new epoch then becomes the currently processed epoch.

To not get stuck in drbd_al_begin_io() waiting for P_BARRIER_ACK,
the sender thread also needs to handle the case when the current
epoch was closed already, but no new requests are queued yet,
and send out P_BARRIER as soon as possible.

This is done by comparing the per resource "current transfer log epoch"
(tconn->current_tle_nr) with the per connection "currently processed
epoch number" (tconn->send.current_epoch_nr), while waiting for
new requests to be processed in wait_for_work().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# d5b27b01 14-Nov-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: move the drbd_work_queue from drbd_socket to drbd_connection

cherry-picked and adapted from drbd 9 devel branch
In 8.4, we don't distinguish between "resource work" and "connection
work" yet, we have one worker for both, as we still have only one connection.

We only ever used the "data.work",
no need to keep the "meta.work" around.

Move tconn->data.work to tconn->sender_work.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# b379c41e 17-Nov-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: transfer log epoch numbers are now per resource

cherry-picked from drbd 9 devel branch.

In preparation of multiple connections, the "barrier number" or
"epoch number" needs to be tracked per-resource, not per connection.
The sequence number space will not be reset anymore.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 9d05e7c4 17-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: rename drbd_restart_write to drbd_restart_request

Meanwhile, this is used to restart failed READ requests as well.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 629663c9 08-Jun-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix wrong assert in completion/retry path of failed local reads

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ab53b90e 08-Jun-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix local read error hung forever

The commit
drbd: simplify retry path of failed READ requests
simplified it too much:
it just did not do anything for local read errors.

Add the missing req_may_be_completed_not_susp() to the
READ_COMPLETED_WITH_ERROR case.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 07be15b1 07-May-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix resend/resubmit of frozen IO

DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith),
or because we lost access to data (on-no-data-accessible suspend-io).

Resuming from there (re-connect, or re-attach, or explicit admin
intervention) should "just work".

Unfortunately, if the re-attach/re-connect did not happen within
the timeout, since the commit

drbd: Implemented real timeout checking for request processing time

if so configured, the request_timer_fn() would timeout and
detach/disconnect virtually immediately.

This change tracks the most recent attach and connect, and does not
timeout within <configured timeout interval> after attach/connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 648e46b5 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: complete_conflicting_writes() should not care about connections

complete_conflicting_writes() should not cause -EIO.
It should not timeout either, or care for connection states.

Connection timeout is detected elsewhere, and it's cleanup path is
supposed to remove any pending requests or peer_requests from the
write_requests tree.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 4439c400 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: simplify retry path of failed READ requests

If a local or remote READ request fails, just push it back to the retry
workqueue. It will re-enter __drbd_make_request, and be re-assigned to
a suitable local or remote path, or failed, if we do not have access to
good data anymore.

This obsoletes w_read_retry_remote(),
and eliminates two goto...retry blocks in __req_mod()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2415308e 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: move put_ldev from __req_mod() to the endio callback

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 6870ca6d 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: factor out master_bio completion and drbd_request destruction paths

In preparation for multiple connections and reference counting,
separate the code paths for completion of the master bio
and destruction of the request object.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8d6cdd78 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: conflicting writes: make wake_up of waiting peer_requests explicit

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0afd569a 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ea9d6729 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 27a434fe 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: make OOS_HANDED_TO_NETWORK its own case

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2312f0b3 24-Nov-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential deadlock during "restart" of conflicting writes

w_restart_write(), run from worker context, calls __drbd_make_request()
and further drbd_al_begin_io(, delegate=true), which then
potentially deadlocks. The previous patch moved a BUG_ON to expose
such call paths, which would now be triggered.

Also, if we call __drbd_make_request() from resource worker context,
like w_restart_write() did, and that should block for whatever reason
(!drbd_state_is_stable(), resource suspended, ...),
we potentially deadlock the whole resource, as the worker
is needed for state changes and other things.

Create a dedicated retry workqueue for this instead.

Also make sure that inc_ap_bio()/dec_ap_bio() are properly paired,
even if do_retry() needs to retry itself,
in case __drbd_make_request() returns != 0.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 81f44862 26-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: Fix a potential race that could case data inconsistency

When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE
at the same time, and it happens that the line

remote = remote && drbd_should_do_remote(s);

stills sees C_WF_BITMAP_S, and

send_oos = rw == WRITE && drbd_should_send_oos(s);

already sees C_SYNC_SOURCE both are 0.

This causes the write to not be mirrored, but marked as out-of-sync on the
Sync_Source node.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 38a05c16 07-Mar-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Consider that bio->bi_bdev might be modified below DRBD

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 72585d24 22-Feb-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: add missing part_round_stats to _drbd_start_io_acct

Without this, iostat frequently sees bogus svctime and >= 100% "utilization".

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 93f5afe9 22-Feb-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: If disk timeout expires fail only the affected volume

...and not all volumes of the resource

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 69b6a3b1 20-Dec-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: restart loop in drbd_make_request() [prepare for Linux-3.2]

With Linux-3.2 generic_make_request() will no longer loop over
the request function until it finally returns 0. Move this
loop into our drbd_make_request() function.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# e8cdc343 13-Dec-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Consider that read requests could be NEG_ACKEDed

ap_in_flight only counts writes. NEG_ACKED is an action
on a request that might be called for reads and writes.

This bug was there forever, but it becomes much more
relevant with the read balincing code.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 57bcb6cf 03-Dec-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Do not call generic_make_request() while holding req_lock

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# d60de03a 17-Nov-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Load balancing method: striping

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 380207d0 10-Nov-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Load balancing of read requests

New config option for the disk secition "read-balancing", with
the values: prefer-local, prefer-remote, round-robin, when-congested-remote.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 6936fcb4 10-Nov-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Move the CREATE_BARRIER flag from connection to device

That is necessary since the whole transfer log is per connection(tconn)
and not per device(mdev).

This bug caused list corruption on the worker list. When a barrier is queued
for sending in the context of one device, another device did not see the
CREATE_BARRIER bit, and queued the same object again -> list corruption.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 376694a0 07-Nov-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Silenced compiler warnings

Since version 4.6.1 gcc warns about variables that get
a value assigned, but which are never read later on.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a209b4ae 16-Aug-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Update some outdated comments to match the code

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 97ddb687 15-Jul-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: detach must not try to abort non-local requests

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 3b03ad59 15-Jul-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Do not mod_timer() with a past time

In case we can not find out why the request takes too long
(happens e.g. when IO got suspended on DRBD level). rearm
the timer with a reasonable value.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# cdfda633 05-Jul-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: detach from frozen backing device

* drbd-8.3:
documentation: Documented detach's --force and disk's --disk-timeout
drbd: Implemented the disk-timeout option
drbd: Force flag for the detach operation
drbd: Allow new IOs while the local disk in in FAILED state
drbd: Bitmap IO functions can not return prematurely if the disk breaks
drbd: Added a kref to bm_aio_ctx
drbd: Hold a reference to ldev while doing meta-data IO
drbd: Keep a reference to the bio until the completion handler finished
drbd: Implemented wait_until_done_or_disk_failure()
drbd: Replaced md_io_mutex by an atomic: md_io_in_use
drbd: moved md_io into mdev
drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk
drbd: Keep a reference to barrier acked requests

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 44ed167d 19-Apr-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: rcu_read_lock() and rcu_dereference() for tconn->net_conf

Removing the get_net_conf()/put_net_conf() calls

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 303d1448 13-Apr-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Runtime changeable wire protocol

The wire protocol is no longer a property that is negotiated
between the two peers. It is now expressed with two bits
(DP_SEND_WRITE_ACK and DP_SEND_RECEIVE_ACK) in each data
packet. Therefore the primary node is free to change the
wire protocol at any time without disconnect/reconnect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8b924f1d 01-Mar-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Use tconn in request_timer_fn()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2aebfabb 28-Mar-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Renamed id_susp(union drbd_state s) to drbd_suspended(struct drbd_conf *)

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 23361cf3 31-Mar-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: get rid of bio_split, allow bios of "arbitrary" size

Where "arbitrary" size is currently 1 MiB, which is the BIO_MAX_SIZE
for architectures with 4k PAGE_CACHE_SIZE (most).

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 181286ad 31-Mar-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: preparation commit, pass drbd_interval to drbd_al_begin/complete_io

We want to avoid bio_split for bios crossing activity log boundaries.
So we may need to activate two activity log extents "atomically".
drbd_al_begin_io() needs to know more than just the start sector.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8f7bed77 19-Dec-2010 Andreas Gruenbacher <agruen@linbit.com>

drbd: Rename various functions from *_oos_* to *_out_of_sync_* for clarity

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0da34df0 19-Dec-2010 Andreas Gruenbacher <agruen@linbit.com>

drbd: drbd_may_do_local_read(): Use bool/true/false

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 1097e9a8 17-Dec-2010 Andreas Gruenbacher <agruen@linbit.com>

drbd: Remove unnecessary assertion

This is also checked further below in the same function.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ccae7868 26-Sep-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: log request sector offset and size for IO errors

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# a2a3c74f 21-Sep-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: always write bitmap on detach

If we detach due to local read-error (which sets a bit in the bitmap),
stay Primary, and then re-attach (which re-reads the bitmap from disk),
we potentially lost the "out-of-sync" (or, "bad block") information in
the bitmap.

Always (try to) write out the changed bitmap pages before going diskless.

That way, we don't lose the bit for the bad block,
the next resync will fetch it from the peer, and rewrite
it locally, which may result in block reallocation in some
lower layer (or the hardware), and thereby "heal" the bad blocks.

If the bitmap writeout errors out as well, we will (again: try to)
mark the "we need a full sync" bit in our super block,
if it was a READ error; writes are covered by the activity log already.

If that superblock does not make it to disk either, we are sorry.

Maybe we just lost an entire disk or controller (or iSCSI connection),
and there actually are no bad blocks at all, so we don't need to
re-fetch from the peer, there is no "auto-healing" necessary.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 06f10adb 22-Sep-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: prepare for more than 32 bit flags

- struct drbd_conf { ... unsigned long flags; ... }
+ struct drbd_conf { ... unsigned long drbd_flags[N]; ... }

And introduce wrapper functions for test/set/clear bit operations
on this member.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 509fc019 31-Jul-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Finish requests that completed while IO was frozen

Requests of an acked epoch are stored on the barrier_acked_requests list. In
case the private bio of such a request completes while IO on the drbd device
is suspended [req_mod(completed_ok)] then the request stays there.

When thawing IO because the fence_peer handler returned, then we use
tl_clear() to apply the connection_lost_while_pending event to all requests
on the transfer-log and the barrier_acked_requests list.

Up to now the connection_lost_while_pending event was not applied
on requests on the barrier_acked_requests list. Fixed that.

I.e. now the connection_lost_while_pending and resend events are
applied to requests on the barrier_acked_requests list. For that
it is necessary that the resend event finishes (local only)
READS correctly.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 227f052f 31-Jul-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix drbd wire compatibility for empty flushes

DRBD has a concept of request epochs or reorder-domains,
which are separated on the wire by P_BARRIER packets.

Older DRBD is not able to handle zero-sized requests at all,
so we need to map empty flushes to these drbd barriers.

These are the equivalent of empty flushes, and
by default trigger flushes on the receiving side anyways
(unless not supported or explicitly disabled),
so there is no need to handle this differently in newer drbd either.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a73ff323 25-Jun-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: announce FLUSH/FUA capability to upper layers

Unconditionally announce FLUSH/FUA to upper layers.
If the lower layers on either node do not actually support this,
generic_make_request() will deal with it.

If this causes performance regressions on your setup,
make sure there are no volatile caches involved,
and mount -o nobarrier or equivalent.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 383606e0 14-Jun-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: differentiate between normal and forced detach

Aborting local requests (not waiting for completion from the lower level
disk) is dangerous: if the master bio has been completed to upper
layers, data pages may be re-used for other things already.
If local IO is still pending and later completes,
this may cause crashes or corrupt unrelated data.

Only abort local IO if explicitly requested.
Intended use case is a lower level device that turned into a tarpit,
not completing io requests, not even doing error completion.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0d5934e3 08-Jun-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix null pointer dereference with on-congestion policy when diskless

We must not look at mdev->actlog, unless we have a get_ldev() reference.
It also does not make much sense to try to disconnect or pull-ahead of
the peer, if we don't have good local data.

Only even consider congestion policies, if our local disk is D_UP_TO_DATE.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 1ed25b26 08-Jun-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix list corruption by failing but already aborted reads

If a read is aborted due to force-detach of a supposedly unresponsive
local backing device, and retried on the peer, it can happen that the
local request later still completes (hopefully with an error).
As it may already have been completed to upper layers meanwhile,
it must not be retried again now.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# f6d0a8db 29-Apr-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Restore the request restart logic

It got lost with the commit 5a7bbad27a410350e64a2d7f5ec18fc73836c14f
"block: remove support for bio remapping from ->make_request"

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ba280c09 25-Apr-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix resend/resubmit of frozen IO

DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith),
or because we lost access to data (on-no-data-accessible suspend-io).

Resuming from there (re-connect, or re-attach, or explicit admin
intervention) should "just work".

Unfortunately, if the re-attach/re-connect did not happen within
the timeout, since the commit
drbd: Implemented real timeout checking for request processing time
if so configured, the request_timer_fn() would timeout and
detach/disconnect virtually immediately.

This change tracks the most recent attach and connect, and does not
timeout within <configured timeout interval> after attach/connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 46385c84 16-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: move put_ldev from __req_mod() to the endio callback

One invocation in the endio handler is good enough,
we don't need mention it for each of the different ways
it calls __req_mod().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# d64957c9 23-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE

Just because this request happened during a resync does
not mean it may pretend to have been barrier-acked.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 41c4a003 11-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended

READ_RETRY_REMOTE_CANCELED needs to be grouped with the other _CANCELED
cases, not with CONNECTION_LOST_WHILE_PENDING, as that would complete
(fail) the bio even if the device became suspended.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 6d49e101 11-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: make OOS_HANDED_TO_NETWORK its own case

OOS_HANDED_TO_NETWORK should not be grouped with the various
*_CANCELED/*_FAILED cases.
Also, not only clear the RQ_NET_QUEUED flag, but also mark it RQ_NET_DONE,
so it can be distinguished from a local-only request even after that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 001a8868 08-Mar-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential data corruption and protocol error

We assumed only bios with bi_idx == 0 would end up
in drbd_make_request().

That is wrong.

At least device mapper, in __clone_and_map(), may submit
clones only covering a partial bio, but sharing
the original bvec, by adjusting bi_idx and relevant
other bio members of the clone.

We used __bio_for_each_segment() in various places,
even though that is documented as
* drivers should not use the __ version unless they _really_ want to
* run through the entire bio and not just pending pieces

Impact: we would send the full bio bvec, even for the clone
with bi_idx > 0, which will cause data corruption on the
peer (because we submit wrong data at the clone offset),
and will cause a DRBD protocol error, disconnect/reconnect
and resync (thus fixing the corruption),
because the next package header would be expected right
in the middle of the sent data, causing DRBD magic mismatch.

Fix: drop the assert, and use bio_for_each_segment()
instead of the __ version.

Conflicts:

drbd/drbd_tracing.c

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# fc28845b 15-Jan-2012 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Fix a potential race that could case data inconsistency

When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE
at the same time, and it happens that the line

remote = remote && drbd_should_do_remote(s);

stills sees C_WF_BITMAP_S, and

send_oos = rw == WRITE && drbd_should_send_oos(s);

already sees C_SYNC_SOURCE both are 0.

This causes the write to not be mirrored, but marked as out-of-sync on the
Sync_Source node.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 031a7c17 21-Jan-2012 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: add missing part_round_stats to _drbd_start_io_acct

Without this, iostat frequently sees bogus svctime and >= 100% "utilization".

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# dfa8bedb 29-Jun-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Implemented the disk-timeout option

When the disk-timeout is active, and it expires for a single request,
we consider the local disk as D_FAILED. Note: With this change,
I made both timeout based state transitions HARD state transitions.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2b4dd36f 14-Mar-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2f5cdd0b 21-Feb-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Converted the transfer log from mdev to tconn

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 1b3bb47d 28-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Remove redundant check

Opening a device only succeeds on a primary node, or when explicitly
setting the allow_oos module parameter to allow opening the device
read-only on a secondary node. There is no other way that a request can
get into drbd_make_request(), so this code cannot trigger.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 7be8da07 21-Feb-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Improve how conflicting writes are handled

The previous algorithm for dealing with overlapping concurrent writes
was generating unnecessary warnings for scenarios which could be
legitimate, and did not always handle partially overlapping requests
correctly. Improve it algorithm as follows:

* While local or remote write requests are in progress, conflicting new
local write requests will be delayed (commit 82172f7).

* When a conflict between a local and remote write request is detected,
the node with the discard flag decides how to resolve the conflict: It
will ask its peer to discard conflicting requests which are fully
contained in the local request and retry requests which overlap only
partially. This involves a protocol change.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8c387def 18-Feb-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: simplify condition in drbd_may_do_local_read()

fold
if (x >= (N+1))
return 0;
if (x < N)
return 0;
into
if (x != N)
return 0;

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# c670a398 20-Feb-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Use the IS_ALIGNED() macro in some more places

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8ca9844f 20-Feb-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Remove obsolete comment

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# fcefa62e 17-Feb-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Rename drbd_endio_{pri,sec} -> drbd_{,peer_}request_endio

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a21e9298 08-Feb-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Moved the mdev member into drbd_work (from drbd_request and drbd_peer_request)

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 6024fece 28-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Defer new writes when detecting conflicting writes

Before submitting a new local write request, wait for any conflicting
local or remote requests to complete.

We could assume that the new request occurred first and that the
conflicting requests overwrote it (and therefore discard the new
reques), but we know for sure that the new request occurred after the
conflicting requests and so this behavior would we weird. We would also
end up with the wrong result if the new request is not fully contained
within the conflicting requests.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ddd8877d 28-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Remove unnecessary reference counting left-over

Nothing in this function accesses mdev->tconn->net_conf, so there is no
need for get_net_conf() / put_net_conf() anymore.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 5e472264 27-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: _req_conflicts(): Get rid of the epoch_entries tree

Instead of keeping a separate tree for local and remote write requests
for finding requests and for conflict detection, use the same tree for
both purposes. Introduce a flag to allow distinguishing the two
possible types of entries in this tree.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 53840641 28-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Allow to wait for the completion of an epoch entry as well

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a500c2ef 27-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: struct drbd_request: Introduce a new collision flag

This flag is set when a processes puts itself to sleep to wait for a
conflicting request to complete.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 9e204cdd 26-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Move some functions to where they are used

Move drbd_update_congested() to drbd_main.c, and drbd_req_new() and
drbd_req_free() to drbd_req.c: those functions are not used anywhere
else.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 5a7bbad2 11-Sep-2011 Christoph Hellwig <hch@infradead.org>

block: remove support for bio remapping from ->make_request

There is very little benefit in allowing to let a ->make_request
instance update the bios device and sector and loop around it in
__generic_make_request when we can archive the same through calling
generic_make_request from the driver and letting the loop in
generic_make_request handle it.

Note that various drivers got the return value from ->make_request and
returned non-zero values for errors.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 87eeee41 19-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: moved req_lock and transfer log from mdev to tconn

sed -i \
-e 's/mdev->req_lock/mdev->tconn->req_lock/g' \
-e 's/mdev->unused_spare_tle/mdev->tconn->unused_spare_tle/g' \
-e 's/mdev->newest_tle/mdev->tconn->newest_tle/g' \
-e 's/mdev->oldest_tle/mdev->tconn->oldest_tle/g' \
-e 's/mdev->out_of_sequence_requests/mdev->tconn->out_of_sequence_requests/g' \
*.[ch]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 31890f4a 19-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: moved agreed_pro_version, last_received and ko_count to tconn

sed -i \
-e 's/mdev->agreed_pro_version/mdev->tconn->agreed_pro_version/g' \
-e 's/mdev->last_received/mdev->tconn->last_received/g' \
-e 's/mdev->ko_count/mdev->tconn->ko_count/g' \
*.[ch]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# e42325a5 19-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: moved data and meta from mdev to tconn

Patch mostly:

sed -i -e 's/mdev->data/mdev->tconn->data/g' \
-e 's/mdev->meta/mdev->tconn->meta/g' \
*.[ch]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# b2fb6dbe 19-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: moved net_cont and net_cnt_wait from mdev to tconn

Patch partly generated by:

sed -i -e 's/get_net_conf(mdev)/get_net_conf(mdev->tconn)/g' \
-e 's/put_net_conf(mdev)/put_net_conf(mdev->tconn)/g' \
-e 's/get_net_conf(odev)/get_net_conf(odev->tconn)/g' \
-e 's/put_net_conf(odev)/put_net_conf(odev->tconn)/g' \
*.[ch]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 89e58e75 19-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: moved net_conf from mdev to tconn

Besides moving the struct member, everything else is generated by:

sed -i -e 's/mdev->net_conf/mdev->tconn->net_conf/g' \
-e 's/odev->net_conf/odev->tconn->net_conf/g' \
*.[ch]

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8554df1c 25-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Convert all constants in enum drbd_req_event to upper case

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# bb3bfe96 21-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Remove the unused hash tables

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8b946255 20-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Use interval tree for overlapping epoch entry detection

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 010f6e67 14-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Put sector and size in struct drbd_epoch_entry into struct drbd_interval

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# dac1389c 21-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Add read_requests tree

We do not do collision detection for read requests, but we still need to
look up the request objects when we receive a package over the network.
Using the same data structure for read and write requests results in
simpler code once the tl_hash and app_reads_hash tables are removed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# de696716 20-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Use interval tree for overlapping write request detection

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# ace652ac 03-Jan-2011 Andreas Gruenbacher <agruen@linbit.com>

drbd: Put sector and size in struct drbd_request into struct drbd_interval

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 24c4830c 21-May-2011 Bart Van Assche <bvanassche@acm.org>

drbd: Fix spelling

Found these with the help of ispell -l.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>


# 76727f68 16-May-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential activity log refcount imbalance in error path

It is no longer sufficient to trigger on local WRITE,
we need to check on (rq_state & RQ_IN_ACT_LOG)
before calling drbd_al_complete_io also in the error path.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 03567812 13-Jan-2011 Or Gerlitz <ogerlitz@voltaire.com>

drbd: drop code present under #ifdef which is relevant to 2.6.28 and below

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 7fde2be9 01-Mar-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Implemented real timeout checking for request processing time

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 039312b6 21-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Removed left over, now wrong comments

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# e636db5b 21-Jan-2011 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential imbalance of ap_in_flight

When we receive a barrier ack, we walk the ring list of drbd requests
in the transfer log of the respective epoch, do some housekeeping,
and free those objects.

We tried to keep epochs of mirrored and unmirrored drbd requests
separate, and assert that no local-only requests are present in a
barrier_acked epoch.

It turns out that this has quite a number of corner cases and would
add bloated code without functional benefit.

We now revert the (insufficient) commits
drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions
drbd: Ensure that an epoch contains only requests of one kind
and instead fix the processing of barrier acks to cope with
a mix of local-only and mirrored requests.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 6a35c45f 17-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Ensure that an epoch contains only requests of one kind

The assert in drbd_req.c:755 forces us to have only requests of
one kind in an epoch. The two kinds we distinguish here are:
local-only or mirrored.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 71c78cfb 14-Jan-2011 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Nothing should stop SyncSource -> Ahead transitions

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# da0a7816 23-Dec-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Be more careful with SyncSource -> Ahead transitions

We may not get from SyncSource to Ahead if we have sent some
P_RS_DATA_REPLY packets to the peer and are waiting for
P_WRITE_ACK.

Again, this is not relevant for proper tuned systems, but makes
sure that the not-tuned system does not get diverging bitmaps.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# c88d65e2 20-Dec-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Documenting drbd_should_do_remote() and drbd_should_send_oos()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 81e84650 09-Dec-2010 Andreas Gruenbacher <agruen@linbit.com>

drbd: Use the standard bool, true, and false keywords

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0cf9d27e 07-Dec-2010 Andreas Gruenbacher <agruen@linbit.com>

drbd: Get rid of unnecessary macros (2)

The FAULT_ACTIVE macro just wraps the drbd_insert_fault macro for no
apparent reason.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2f58dcfc 13-Dec-2010 Andreas Gruenbacher <agruen@linbit.com>

drbd: Rename drbd_make_request_26 to drbd_make_request

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8a3c1044 05-Dec-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix regression, we need to close drbd epochs during normal operation

commit e2041475e6ddb081734d161f6421977323f5a9b9
drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap

Contained a bad chunk that tried to optimize away drbd barriers during
bitmap exchange, but accidentally dropped them for normal mode as well.

Impact: depending on activity log size and access pattern, activity log
extents may not be recycled in time, causeing IO to block indefinetely.

Fix: skip drbd barriers only if there is no connection to send them on,
or the request being completed has not been on the network at all.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 3719094e 09-Nov-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap

* C_STARTING_SYNC_S, C_STARTING_SYNC_T In these states the bitmap gets
written to disk. Locking out of app-IO is done by using the
drbd_queue_bitmap_io() and drbd_bitmap_io() functions these days.
It is no longer necessary to lock out app-IO based on the connection
state.
App-IO that may come in after the BITMAP_IO flag got cleared before the
state transition to C_SYNC_(SOURCE|TARGET) does not get mirrored, sets
a bit in the local bitmap, that is already set, therefore changes nothing.

* C_WF_BITMAP_S In this state we send updates (P_OUT_OF_SYNC packets).
With that we make sure they have the same number of bits when going
into the C_SYNC_(SOURCE|TARGET) connection state.

* C_UNCONNECTED: The receiver starts, no need to lock out IO.

* C_DISCONNECTING: in drbd_disconnect() we had a wait_event()
to wait until ap_bio_cnt reaches 0. Removed that.

* C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE
C_PROTOCOL_ERROR, C_TEAR_DOWN: Same as C_DISCONNECTING

* C_WF_REPORT_PARAMS: IO still possible since that is still
like C_WF_CONNECTION.

And we do not need to send barriers in C_WF_BITMAP_S connection state.

Allow concurrent accesses to the bitmap when receiving the bitmap.
Everything gets ORed anyways.

A drbd_free_tl_hash() is in after_state_chg_work(). At that point
all the work items of the last connections must have been processed.

Introduced a call to drbd_free_tl_hash() into drbd_free_mdev()
for paranoia reasons.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# aeda1cd6 09-Nov-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Begin to account BIO processing time before inc_ap_bio()

Since inc_ap_bio() might sleep already

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 73a01a18 27-Oct-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: New packet for Ahead/Behind mode: P_OUT_OF_SYNC

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 67531718 26-Oct-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Implemented two new connection states Ahead/Behind

In this connection mode, the ahead node no longer replicates
application IO. The behind's disk becomes out dated.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 759fbdfb 26-Oct-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Track the numbers of sectors in flight

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 1816a2b4 11-Nov-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: properly use max_hw_sectors to limit the our bio size

To ease tracking of bios in some hash tables, we want it to
not cross certain boundaries (128k, used to be 32k).
We limit the maximum bio size using queue parameters.

Historically some defines and variables we use there have been named
max_segment_size, which was misguided. Rename them to max_bio_size,
and use [blk_]queue_max_hw_sectors where appropriate.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 7eaceacc 10-Mar-2011 Jens Axboe <jaxboe@fusionio.com>

block: remove per-queue plugging

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 650789c8 25-Aug-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Removed checks for REQ_HARDBARRIER on incomming BIOs

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 8825f7c3 21-Oct-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Silenced an assert

That assertion's condition needed adjustment for today's semantics

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# fb2c7a10 18-Oct-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: rate limit an error message

If we don't rate limit it, and you happen to log err level messages via
serial console, an IO error on a disconnected Primary may cause serious
unresponsiveness.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 6719fb03 18-Oct-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix potential data divergence after multiple failures

If we get an IO-error during an activity log transaction,
if we failed to write the bitmap of the evicted extent,
we must not write the transaction itself.
If we failed to write the transaction,
we must not even submit the corresponding bio,
as its extent is not yet marked in the activity log.

Otherwise, if this was a disconneted Primary (degraded cluster), which
now lost its disk as well, and we later re-attach the same backend
storage, we possibly "forget" to resync some parts of the disk that
potentially have been changed.

On the receiving side, when receiving from a peer with unhealthy disk,
checking for pdsk == D_DISKLESS is not enough, we need to set out of
sync and do AL transactions for everything pdsk < D_INCONSISTENT on the
receiving side.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# fb22c402 08-Sep-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Track the reasons to suspend IO in dedicated state bits

There are three ways to get IO suspended:

* Loss of any access to data
* Fence-peer-handler running
* User requested to suspend IO

Track those in different bits, so that one condition clearing its
state bit does not interfere with the other two conditions.

Only when the user resumes IO he overrules all three bits.

The fact is hidden from the user, he sees only a single suspend
bit.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 0778286a 30-Aug-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Disable activity log updates when the whole device is out of sync

When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.

As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.

BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# d28fd092 09-Jul-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix list corruption (recent regression)

The commit 288f422ec13667de40b278535d2a5fb5c77352c4
drbd: Track all IO requests on the TL, not writes only
moved a list_add_tail(req, ) into a region where req
may have just been freed due to conflict detection.

Fix this by adding a proper cleanup section for that code path.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# cfa03415 23-Jun-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Allow tl_restart() to do IO completion while IO is suspended

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 481c6f50 22-Jun-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Ensure that the peer was not rebootet in the meantime before resending TL

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 47ff2d0a 18-Jun-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Do not allow a fencing-policy of resource-and-stonith with protocol A

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 265be2d0 31-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Finished the "on-no-data-accessible suspend-io;" functionality

When no data is accessible (no connection to the peer, nor a local disk)
allow the user to select to freeze all IO operations instead of getting
IO errors.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 905cd7d8 10-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Removed redundant error checks in the request code path

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 11b58e73 12-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: factored tl_restart() out of tl_clear().

If IO was frozen for a temporal network outage, resend the
content of the transfer-log into the newly established connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 2a80699f 09-Jun-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: mod_req has now a return value

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 288f422e 27-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Track all IO requests on the TL, not writes only

With that the drbd_fail_pending_reads() function becomes obsolete.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 7e602c0a 27-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: renamed drbd_tl_epoch.n_req to drbd_tl_epoch.n_writes

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 7b6d91da 07-Aug-2010 Christoph Hellwig <hch@lst.de>

block: unify flags for struct bio and struct request

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 2a0ab2cd 26-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Reduce verbosity

The "Local READ/WRITE failed" messages are too verbose.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# d255e5ff 27-May-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fix hang on local read errors while disconnected

"canceled" w_read_retry_remote never completed, if they have been
canceled after drbd_disconnect connection teardown cleanup has already
run (or we are currently not connected anyways).

Fixed by not queueing a remote retry if we already know it won't work
(pdsk not uptodate), and cleanup ourselves on "cancel", in case we hit a
race with drbd_disconnect.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 32fa7e91 26-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: Removed the now empty w_io_error() function

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>


# 9a25a04c 10-May-2010 Philipp Reisner <philipp.reisner@linbit.com>

drbd: If we detect late that IO got frozen, retry after we thawed.

If we detect late (= after grabing mdev->req_lock) that IO got frozen, we
return 1 to generic_make_request(), which simply will retry to make a
request for that bio.

In the subsequent call of generic_make_request() into drbd_make_request_26()
we sleep in inc_ap_bio().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a1c88d0d 14-May-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: always use_bmbv, ignore setting

Now that the peer may handle multi-bio EEs,
we can ignore the peer's limit,
and concentrate on the limits of the local IO stack.

This is safe accross drbd protocol versions,
as our queue_max_sectors() will be adjusted accordingly.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 979f5c7f 06-Apr-2010 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: fail_requests_early: remove incorrect and unnecessary optimization

The condition does not fit the commend (I may well be Primary,
even if I lost the disk earlier and now the connection).

And this is catched below anyways, where it also gets logged.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 753c8913 18-Nov-2009 Philipp Reisner <philipp.reisner@linbit.com>

drbd_req.c: use part_[inc|dec]_in_flight()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# 83c38830 02-Nov-2009 Lars Ellenberg <lars.ellenberg@linbit.com>

drbd: performance - don't lose unplug events

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>


# a870a3a4 28-Oct-2009 Jens Axboe <jens.axboe@oracle.com>

drbd: fix in_flight rw indexing

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 25d2d4ed 05-Oct-2009 Jens Axboe <jens.axboe@oracle.com>

drbd: fixup for reverted dual in_flight patch

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# 6a0afdf5 01-Oct-2009 Jens Axboe <jens.axboe@oracle.com>

drbd: remove tracing bits

They should be reimplemented in the current scheme.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>


# ab8fafc2 28-Sep-2009 Lars Ellenberg <lars.ellenberg@linbit.com>

dropping unneeded include autoconf.h

It is force-included on the gcc command line since at least 2.6.15.
Explicit include lines seem to break compilation now in certain configurations.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>


# b411b363 25-Sep-2009 Philipp Reisner <philipp.reisner@linbit.com>

The DRBD driver

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>