History log of /linux-master/fs/xfs/xfs_bio_io.c
Revision Date Author Comments
# d03025ae 14-Jul-2022 Bart Van Assche <bvanassche@acm.org>

fs/xfs: Use the enum req_op and blk_opf_t types

Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for the
combination of a request operation with request flags.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-63-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 919edbad 29-Mar-2022 Dave Chinner <dchinner@redhat.com>

xfs: drop async cache flushes from CIL commits.

Jan Kara reported a performance regression in dbench that he
bisected down to commit bad77c375e8d ("xfs: CIL checkpoint
flushes caches unconditionally").

Whilst developing the journal flush/fua optimisations this cache was
part of, it appeared to made a significant difference to
performance. However, now that this patchset has settled and all the
correctness issues fixed, there does not appear to be any
significant performance benefit to asynchronous cache flushes.

In fact, the opposite is true on some storage types and workloads,
where additional cache flushes that can occur from fsync heavy
workloads have measurable and significant impact on overall
throughput.

Local dbench testing shows little difference on dbench runs with
sync vs async cache flushes on either fast or slow SSD storage, and
no difference in streaming concurrent async transaction workloads
like fs-mark.

Fast NVME storage.

From `dbench -t 30`, CIL scale:

clients async sync
BW Latency BW Latency
1 935.18 0.855 915.64 0.903
8 2404.51 6.873 2341.77 6.511
16 3003.42 6.460 2931.57 6.529
32 3697.23 7.939 3596.28 7.894
128 7237.43 15.495 7217.74 11.588
512 5079.24 90.587 5167.08 95.822

fsmark, 32 threads, create w/ 64 byte xattr w/32k logbsize

create chown unlink
async 1m41s 1m16s 2m03s
sync 1m40s 1m19s 1m54s

Slower SATA SSD storage:

From `dbench -t 30`, CIL scale:

clients async sync
BW Latency BW Latency
1 78.59 15.792 83.78 10.729
8 367.88 92.067 404.63 59.943
16 564.51 72.524 602.71 76.089
32 831.66 105.984 870.26 110.482
128 1659.76 102.969 1624.73 91.356
512 2135.91 223.054 2603.07 161.160

fsmark, 16 threads, create w/32k logbsize

create unlink
async 5m06s 4m15s
sync 5m00s 4m22s

And on Jan's test machine:

5.18-rc8-vanilla 5.18-rc8-patched
Amean 1 71.22 ( 0.00%) 64.94 * 8.81%*
Amean 2 93.03 ( 0.00%) 84.80 * 8.85%*
Amean 4 150.54 ( 0.00%) 137.51 * 8.66%*
Amean 8 252.53 ( 0.00%) 242.24 * 4.08%*
Amean 16 454.13 ( 0.00%) 439.08 * 3.31%*
Amean 32 835.24 ( 0.00%) 829.74 * 0.66%*
Amean 64 1740.59 ( 0.00%) 1686.73 * 3.09%*

Performance and cache flush behaviour is restored to pre-regression
levels.

As such, we can now consider the async cache flush mechanism an
unnecessary exercise in premature optimisation and hence we can
now remove it and the infrastructure it requires completely.

Fixes: bad77c375e8d ("xfs: CIL checkpoint flushes caches unconditionally")
Reported-and-tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 49add496 24-Jan-2022 Christoph Hellwig <hch@lst.de>

block: pass a block_device and opf to bio_init

Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 07888c66 24-Jan-2022 Christoph Hellwig <hch@lst.de>

block: pass a block_device and opf to bio_alloc

Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment. NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 0431d926 18-Jun-2021 Dave Chinner <dchinner@redhat.com>

xfs: async blkdev cache flush

The new checkpoint cache flush mechanism requires us to issue an
unconditional cache flush before we start a new checkpoint. We don't
want to block for this if we can help it, and we have a fair chunk
of CPU work to do between starting the checkpoint and issuing the
first journal IO.

Hence it makes sense to amortise the latency cost of the cache flush
by issuing it asynchronously and then waiting for it only when we
need to issue the first IO in the transaction.

To do this, we need async cache flush primitives to submit the cache
flush bio and to wait on it. The block layer has no such primitives
for filesystems, so roll our own for the moment.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 5f7136db 28-Jan-2021 Matthew Wilcox (Oracle) <willy@infradead.org>

block: Add bio_max_segs

It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same. Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 488ca3d8 09-Jul-2019 Christoph Hellwig <hch@lst.de>

xfs: chain bios the right way around in xfs_rw_bdev

We need to chain the earlier bios to the later ones, so that
submit_bio_wait waits on the bio that all the completions are
dispatched to.

Fixes: 6ad5b3255b9e ("xfs: use bios directly to read and write the log recovery buffers")
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 6ad5b325 28-Jun-2019 Christoph Hellwig <hch@lst.de>

xfs: use bios directly to read and write the log recovery buffers

The xfs_buf structure is basically used as a glorified container for
a memory allocation in the log recovery code. Replace it with a
call to kmem_alloc_large and a simple abstraction to read into or
write from it synchronously using chained bios.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>