History log of /linux-master/fs/gfs2/file.c
Revision Date Author Comments
# a6bf23e1 31-Jan-2024 Jeff Layton <jlayton@kernel.org>

gfs2: adapt to breakup of struct file_lock

Most of the existing APIs have remained the same, but subsystems that
access file_lock fields directly need to reach into struct
file_lock_core now.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-38-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# a69ce85e 31-Jan-2024 Jeff Layton <jlayton@kernel.org>

filelock: split common fields into struct file_lock_core

In a future patch, we're going to split file leases into their own
structure. Since a lot of the underlying machinery uses the same fields
move those into a new file_lock_core, and embed that inside struct
file_lock.

For now, add some macros to ensure that we can continue to build while
the conversion is in progress.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# b4c6d52d 31-Jan-2024 Jeff Layton <jlayton@kernel.org>

gfs2: convert to using new filelock helpers

Convert to using the new file locking helper functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20240131-flsplit-v3-9-c6129007ee8d@kernel.org
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 4d927b03 20-Dec-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn

This function checks whether the filesystem has been been marked to be
withdrawn eventually or has been withdrawn already. Rename this
function to avoid confusing code like checking for gfs2_withdrawing()
when gfs2_withdrawn() has already returned true.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 015af1af 20-Dec-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Mark withdraws as unlikely

Mark the gfs2_withdrawn(), gfs2_withdrawing(), and
gfs2_withdraw_in_prog() inline functions as likely to return %false.
This allows to get rid of likely() and unlikely() annotations at the
call sites of those functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f7e4c610 20-Oct-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up gfs2_alloc_parms initializers

When intializing a struct, all fields that are not explicitly mentioned
are zeroed out already.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 089f4eb0 20-Sep-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Don't update inode timestamps for direct writes

During direct reads and writes, the caller is holding the inode glock in
deferred mode, which doesn't allow metadata updates. However, a previous
change caused callers to update the inode modification time before carrying out
direct writes, which caused the inode glock to be converted to exclusive mode
for the timestamp update, only to be immediately converted back to deferred
mode for the direct write. This locks out other direct readers and writers
and wreaks havoc on performance.

Fix that by reverting to not updating the inode modification time for direct
writes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# dc52cd2e 20-Jul-2023 Alexander Aring <aahringo@redhat.com>

fs: dlm: fix F_CANCELLK to cancel pending request

This patch fixes the current handling of F_CANCELLK by not just doing a
unlock as we need to try to cancel a lock at first. A unlock makes sense
on a non-blocking lock request but if it's a blocking lock request we
need to cancel the request until it's not granted yet. This patch is fixing
this behaviour by first try to cancel a lock request and if it's failed
it's unlocking the lock which seems to be granted.

Note: currently the nfs locking handling was disabled by commit
40595cdc93ed ("nfs: block notification on fs with its own ->lock").
However DLM was never being updated regarding to this change. Future
patches will try to fix lockd lock requests for DLM. This patch is
currently assuming the upstream DLM lockd handling is correct.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>


# 2ba39cc4 01-Aug-2023 Christoph Hellwig <hch@lst.de>

fs: rename and move block_page_mkwrite_return

block_page_mkwrite_return is neither block nor mkwrite specific, and
should not be under CONFIG_BLOCK. Move it to mm.h and rename it to
vmf_fs_error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 8a8b8d91 05-Jul-2023 Jeff Layton <jlayton@kernel.org>

gfs2: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-45-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 0be84321 02-Aug-2023 Bob Peterson <rpeterso@redhat.com>

gfs2: Don't use filemap_splice_read

Starting with patch 2cb1e08985, gfs2 started using the new function
filemap_splice_read rather than the old (and subsequently deleted)
function generic_file_splice_read.

filemap_splice_read works by taking references to a number of folios in
the page cache and splicing those folios into a pipe. The folios are
then read from the pipe and the folio references are dropped. This can
take an arbitrary amount of time. We cannot allow that in gfs2 because
those folio references will pin the inode glock to the node and prevent
it from being demoted, which can lead to cluster-wide deadlocks.

Instead, use copy_splice_read.

(In addition, the old generic_file_splice_read called into ->read_iter,
which called gfs2_file_read_iter, which took the inode glock during the
operation. The new filemap_splice_read interface does not take the
inode glock anymore. This is fixable, but it still wouldn't prevent
cluster-wide deadlocks.)

Fixes: 2cb1e08985e3 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c8ed1b35 11-Jun-2023 Bob Peterson <rpeterso@redhat.com>

gfs2: Fix duplicate should_fault_in_pages() call

In gfs2_file_buffered_write(), we currently jump from the second call of
function should_fault_in_pages() to above the first call, so
should_fault_in_pages() is getting called twice in a row, causing it to
accidentally fall back to single-page writes rather than trying the more
efficient multi-page writes first.

Fix that by moving the retry label to the correct place, behind the
first call to should_fault_in_pages().

Fixes: e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7b7b06d5 11-Jun-2023 Christoph Hellwig <hch@lst.de>

gfs2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method

Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag"), file
systems can just set the FMODE_CAN_ODIRECT flag at open time instead of
wiring up a dummy direct_IO method to indicate support for direct I/O.

Remove .direct_IO from gfs2_aops and set FMODE_CAN_ODIRECT in
gfs2_open_common for regular files that do not use data journalling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 219580ee 01-Jun-2023 Christoph Hellwig <hch@lst.de>

iomap: update ki_pos in iomap_file_buffered_write

All callers of iomap_file_buffered_write need to updated ki_pos, move it
into common code.

Link: https://lkml.kernel.org/r/20230601145904.1385409-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 0d625446 01-Jun-2023 Christoph Hellwig <hch@lst.de>

backing_dev: remove current->backing_dev_info

Patch series "cleanup the filemap / direct I/O interaction", v4.

This series cleans up some of the generic write helper calling conventions
and the page cache writeback / invalidation for direct I/O. This is a
spinoff from the no-bufferhead kernel project, for which we'll want to an
use iomap based buffered write path in the block layer.


This patch (of 12):

The last user of current->backing_dev_info disappeared in commit
b9b1335e6403 ("remove bdi_congested() and wb_congested() and related
functions"). Remove the field and all assignments to it.

Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de
Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# fa58cc88 31-May-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Don't get stuck writing page onto itself under direct I/O

When a direct I/O write is performed, iomap_dio_rw() invalidates the
part of the page cache which the write is going to before carrying out
the write. In the odd case, the direct I/O write will be reading from
the same page it is writing to. gfs2 carries out writes with page
faults disabled, so it should have been obvious that this page
invalidation can cause iomap_dio_rw() to never make any progress.
Currently, gfs2 will end up in an endless retry loop in
gfs2_file_direct_write() instead, though.

Break this endless loop by limiting the number of retries and falling
back to buffered I/O after that.

Also simplify should_fault_in_pages() sightly and add a comment to make
the above case easier to understand.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2cb1e089 22-May-2023 David Howells <dhowells@redhat.com>

splice: Use filemap_splice_read() instead of generic_file_splice_read()

Replace pointers to generic_file_splice_read() with calls to
filemap_splice_read().

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: David Hildenbrand <david@redhat.com>
cc: John Hubbard <jhubbard@nvidia.com>
cc: linux-mm@kvack.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20230522135018.2742245-29-dhowells@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4609e1f1 12-Jan-2023 Christian Brauner <brauner@kernel.org>

fs: port ->permission() to pass mnt_idmap

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# 8782a9ae 12-Jan-2023 Christian Brauner <brauner@kernel.org>

fs: port ->fileattr_set() to pass mnt_idmap

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# 5970e15d 20-Nov-2022 Jeff Layton <jlayton@kernel.org>

filelock: move file locking definitions to separate header file

The file locking definitions have lived in fs.h since the dawn of time,
but they are only used by a small subset of the source files that
include it.

Move the file locking definitions to a new header file, and add the
appropriate #include directives to the source files that need them. By
doing this we trim down fs.h a bit and limit the amount of rebuilding
that has to be done when we make changes to the file locking APIs.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Steve French <stfrench@microsoft.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>


# 4ad02083 02-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Make gfs2_glock_hold return its glock argument

This allows code like 'gl = gfs2_glock_hold(...)'.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b582d5f0 05-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Mark flock glock holders as GL_NOPID

Add the GL_NOPID flag for flock glock holders. Clean up the flag
setting code in do_flock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 56535dc6 23-Jun-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add flocks to glockfd debugfs file

Include flock glocks in the "glockfd" debugfs file. Those are similar to the
iopen glocks; while an open file is holding an flock, it is holding the file's
flock glock.

We cannot take f_fl_mutex in gfs2_glockfd_seq_show_flock() or else dumping the
"glockfd" file would block on flock operations. Instead, use the file->f_lock
spin lock to protect the f_fl_gh.gh_gl glock pointer.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# fcb14cb1 22-May-2022 Al Viro <viro@zeniv.linux.org.uk>

new iov_iter flavour - ITER_UBUF

Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ab37c305 02-Jun-2022 Minghao Chi <chi.minghao@zte.com.cn>

gfs2: Remove redundant NULL check before kfree

kfree on NULL pointer is a no-op.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 53bb540f 26-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Explain some direct I/O oddities

Add some comments explaining the oddities of partial direct I/O reads
and writes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 786f847f 05-May-2022 Christoph Hellwig <hch@lst.de>

iomap: add per-iomap_iter private data

Allow the file system to keep state for all iterations. For now only
wire it up for direct I/O as there is an immediate need for it there.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>


# e1fa9ea8 11-May-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Stop using glock holder auto-demotion for now

We're having unresolved issues with the glock holder auto-demotion mechanism
introduced in commit dc732906c245. This mechanism was assumed to be essential
for avoiding frequent short reads and writes until commit 296abc0d91d8
("gfs2: No short reads or writes upon glock contention"). Since then,
when the inode glock is lost, it is simply re-acquired and the operation
is resumed. This means that apart from the performance penalty, we
might as well drop the inode glock before faulting in pages, and
re-acquire it afterwards.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# fa5dfa64 04-May-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: buffered write prefaulting

In gfs2_file_buffered_write, to increase the likelihood that all the
user memory we're trying to write will be resident in memory, carry out
the write in chunks and fault in each chunk of user memory before trying
to write it. Otherwise, some workloads will trigger frequent short
"internal" writes, causing filesystem blocks to be allocated and then
partially deallocated again when writing into holes, which is wasteful
and breaks reservations.

Neither the chunked writes nor any of the short "internal" writes are
user visible.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 324d116c 05-May-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Align read and write chunks to the page cache

Align the chunks that reads and writes are carried out in to the page
cache rather than the user buffers. This will be more efficient in
general, especially for allocating writes. Optimizing the case that the
user buffer is gfs2 backed isn't very useful; we only need to make sure
we won't deadlock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 72382264 04-May-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Pull return value test out of should_fault_in_pages

Pull the return value test of the previous read or write operation out
of should_fault_in_pages(). In a following patch, we'll fault in pages
before the I/O and there will be no return value to check.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 6d22ff47 04-May-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up use of fault_in_iov_iter_{read,write}able

No need to store the return value of the fault_in functions in separate
variables.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 42e4c3bd 27-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Variable rename

Instead of counting the number of bytes read from the filesystem,
functions gfs2_file_direct_read and gfs2_file_read_iter count the number
of bytes written into the user buffer. Conversely, functions
gfs2_file_direct_write and gfs2_file_buffered_write count the number of
bytes read from the user buffer. This is nothing but confusing, so
change the read functions to count how many bytes they have read, and
the write functions to count how many bytes they have written.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 296abc0d 28-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: No short reads or writes upon glock contention

Commit 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered
I/O") changed gfs2_file_read_iter() and gfs2_file_buffered_write() to
allow dropping the inode glock while faulting in user buffers. When the
lock was dropped, a short result was returned to indicate that the
operation was interrupted.

As pointed out by Linus (see the link below), this behavior is broken
and the operations should always re-acquire the inode glock and resume
the operation instead.

Link: https://lore.kernel.org/lkml/CAHk-=whaz-g_nOOoo8RRiWNjnv2R+h6_xk2F1J4TuSRxk1MtLw@mail.gmail.com/
Fixes: 00bfe02f4796 ("gfs2: Fix mmap + page fault deadlocks for buffered I/O")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e57f9af7 19-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Don't re-check for write past EOF unnecessarily

Only re-check for direct I/O writes past the end of the file after
re-acquiring the inode glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3bde4c48 24-Mar-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Make sure not to return short direct writes

When direct writes fail with -ENOTBLK because we're writing into a
hole (gfs2_iomap_begin()) or because of a page invalidation failure
(iomap_dio_rw()), we're falling back to buffered writes. In that case,
when we lose the inode glock in gfs2_file_buffered_write(), we want to
re-acquire it instead of returning a short write.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 11661835 24-Mar-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove dead code in gfs2_file_read_iter

Function iomap_dio_rw() only returns -ENOTBLK for write requests and
gfs2_file_direct_read() no longer returns -ENOTBLK since commit
1d45bb7f9d2a5 ("gfs2: Use iomap for stuffed direct I/O reads"), so there
is no need to check for -ENOTBLK in gfs2_file_read_iter() anymore.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 46f3e042 17-Mar-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix gfs2_file_buffered_write endless loop workaround

Since commit 554c577cee95b, gfs2_file_buffered_write() can accidentally
return a truncated iov_iter, which might confuse callers. Fix that.

Fixes: 554c577cee95b ("gfs2: Prevent endless loops in gfs2_file_buffered_write")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 124c458a 17-Mar-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor retry logic cleanup

Clean up the retry logic in the read and write functions somewhat.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 52f3f033 14-Mar-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Disable page faults during lockless buffered reads

During lockless buffered reads, filemap_read() holds page cache page
references while trying to copy data to the user-space buffer. The
calling process isn't holding the inode glock, but the page references
it holds prevent those pages from being removed from the page cache, and
that prevents the underlying inode glock from being moved to another
node. Thus, we can end up in the same kinds of distributed deadlock
situations as with normal (non-lockless) buffered reads.

Fix that by disabling page faults during lockless reads as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# bb7f5d96 07-Mar-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix should_fault_in_pages() logic

Fix the fault-in window size logic:
* Use a maximum window size of 1 MiB instead of BIO_MAX_VECS * PAGE_SIZE.
The previous window size was always one page because the pages variable
was accidentally being defined and then redefined in
should_fault_in_pages().
* The nr_dirtied heuristic for guessing when there might be memory
pressure often results in very small window sizes. Don't let
nr_dirtied drop below 8 pages (as btrfs does).
* Compute the window size in units of bytes, not pages.
* Account for page overlap (unaligned iterators).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a4e8145e 14-Jan-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Initialize gh_error in gfs2_glock_nq

The gh_error field if a glock holder is initialized to zero in
gfs2_holder_init(). When a locking operation fails, gh_error is set to
an error code; when it succeeds, the gh_error value is left unchanged.
The field isn't initialized in gfs2_holder_reinit(), which is a problem.
Instead of fixing that directly, initialize gh_error in gfs2_glock_nq().
That also obsoletes the assignment in do_flock().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7336905a 10-Dec-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_setattr_size error path fix

When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get
rid of any reservations the inode may have. Instead, it should pass in
the inode's write count as the second parameter to allow
gfs2_rs_delete() to figure out if the inode has any writers left.

In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left
where we know that there can be no other users of the inode. Replace
those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write
count check.

With that, gfs2_rs_delete() is only called with the inode's actual write
count, so get rid of the second parameter.

Fixes: a097dc7e24cb ("GFS2: Make rgrp reservations part of the gfs2_inode structure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d3add1a9 18-Jan-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Fix gfs2_release for non-writers regression

When a file is opened for writing, the vfs code (do_dentry_open)
calls get_write_access for the inode, thus incrementing the inode's write
count. That writer normally then creates a multi-block reservation for
the inode (i_res) that can be re-used by other writers, which speeds up
writes for applications that stupidly loop on open/write/close.
When the writes are all done, the multi-block reservation should be
deleted when the file is closed by the last "writer."

Commit 0ec9b9ea4f83 broke that concept when it moved the call to
gfs2_rs_delete before the check for FMODE_WRITE. Non-writers have no
business removing the multi-block reservations of writers. In fact, if
someone opens and closes the file for RO while a writer has a
multi-block reservation, the RO closer will delete the reservation
midway through the write, and this results in:

kernel BUG at fs/gfs2/rgrp.c:677! (or thereabouts) which is:
BUG_ON(rs->rs_requested); from function gfs2_rs_deltree.

This patch moves the check back inside the check for FMODE_WRITE.

Fixes: 0ec9b9ea4f83 ("gfs2: Check for active reservation in gfs2_release")
Cc: stable@vger.kernel.org # v5.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 554c577c 09-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Prevent endless loops in gfs2_file_buffered_write

Currently, instead of performing a short write,
iomap_file_buffered_write will fail when part of its iov iterator cannot
be read. In contrast, gfs2_file_buffered_write will loop around if it
can read part of the iov iterator, so we can end up in an endless loop.

This should be fixed in iomap_file_buffered_write (and also
generic_perform_write), but this comes a bit late in the 5.16
development cycle, so work around it in the filesystem by
trimming the iov iterator to the known-good size for now.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 9642c8c4 03-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Only dereference i->iov when iter_is_iovec(i)

Only dereference i->iov after establishing that i is of type ITER_IOVEC.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# e34e6f81 20-Sep-2021 Tim Gardner <tim.gardner@canonical.com>

gfs2: Fix unused value warning in do_gfs2_set_flags()

Coverity complains of an unused value:

CID 119623 (#1 of 1): Unused value (UNUSED_VALUE)
assigned_value: Assigning value -1 to error here, but that stored value is
overwritten before it can be used.
237 error = -EPERM;

Fix it by removing the assignment.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a500bd31 24-Aug-2021 Bob Peterson <rpeterso@redhat.com>

gfs2: Allow append and immutable bits to coexist

Before this patch, function do_gfs2_set_flags checked if the append
and immutable flags were being set while already set. If so, error -EPERM
was given. There's no reason why these two flags should be mutually
exclusive, and if you set them separately, you will, in essence, set
one while it is already set. For example:

chattr +a /mnt/gfs2/file1
chattr +i /mnt/gfs2/file1

The first command sets the append-only flag. Since they are additive,
the second command sets the immutable flag AND append-only flag,
since they both coexist in i_diskflags. So the second command should
not return an error. This bug caused xfstests generic/545 to fail.

This patch simply removes the invalid checks.
I also eliminated an unused parm from do_gfs2_set_flags.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b01b2d72 12-Jul-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix mmap + page fault deadlocks for direct I/O

Also disable page faults during direct I/O requests and implement a
similar kind of retry logic as in the buffered I/O case.

The retry logic in the direct I/O case differs from the buffered I/O
case in the following way: direct I/O doesn't provide the kinds of
consistency guarantees between concurrent reads and writes that buffered
I/O provides, so once we lose the inode glock while faulting in user
pages, we always resume the operation. We never need to return a
partial read or write.

This locking problem was originally reported by Jan Kara. Linus came up
with the idea of disabling page faults. Many thanks to Al Viro and
Matthew Wilcox for their feedback.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4fdccaa0 23-Jul-2021 Andreas Gruenbacher <agruenba@redhat.com>

iomap: Add done_before argument to iomap_dio_rw

Add a done_before argument to iomap_dio_rw that indicates how much of
the request has already been transferred. When the request succeeds, we
report that done_before additional bytes were tranferred. This is
useful for finishing a request asynchronously when part of the request
has already been completed synchronously.

We'll use that to allow iomap_dio_rw to be used with page faults
disabled: when a page fault occurs while submitting a request, we
synchronously complete the part of the request that has already been
submitted. The caller can then take care of the page fault and call
iomap_dio_rw again for the rest of the request, passing in the number of
bytes already tranferred.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 00bfe02f 18-Oct-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix mmap + page fault deadlocks for buffered I/O

In the .read_iter and .write_iter file operations, we're accessing
user-space memory while holding the inode glock. There is a possibility
that the memory is mapped to the same file, in which case we'd recurse
on the same glock.

We could detect and work around this simple case of recursive locking,
but more complex scenarios exist that involve multiple glocks,
processes, and cluster nodes, and working around all of those cases
isn't practical or even possible.

Avoid these kinds of problems by disabling page faults while holding the
inode glock. If a page fault would occur, we either end up with a
partial read or write or with -EFAULT if nothing could be read or
written. In either case, we know that we're not done with the
operation, so we indicate that we're willing to give up the inode glock
and then we fault in the missing pages. If that made us lose the inode
glock, we return a partial read or write. Otherwise, we resume the
operation.

This locking problem was originally reported by Jan Kara. Linus came up
with the idea of disabling page faults. Many thanks to Al Viro and
Matthew Wilcox for their feedback.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 1b223f70 25-Aug-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Eliminate ip->i_gh

Now that gfs2_file_buffered_write is the only remaining user of
ip->i_gh, we can move the glock holder to the stack (or rather, use the
one we already have on the stack); there is no need for keeping the
holder in the inode anymore.

This is slightly complicated by the fact that we're using ip->i_gh for
the statfs inode in gfs2_file_buffered_write as well. Writing to the
statfs inode isn't very common, so allocate the statfs holder
dynamically when needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b924bdab 11-Aug-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Move the inode glock locking to gfs2_file_buffered_write

So far, for buffered writes, we were taking the inode glock in
gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
not holding the inode glock while iomap_write_actor faults in user
pages. It turns out that iomap_write_actor is called inside iomap_begin
... iomap_end, so the user pages were still faulted in while holding the
inode glock and the locking code in iomap_begin / iomap_end was
completely pointless.

Move the locking into gfs2_file_buffered_write instead. We'll take care
of the potential deadlocks due to faulting in user pages while holding a
glock in a subsequent patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2eb7509a 13-May-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add wrapper for iomap_file_buffered_write

Add a wrapper around iomap_file_buffered_write. We'll add code for when
the operation needs to be retried here later.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3e08773c 12-Oct-2021 Christoph Hellwig <hch@lst.de>

block: switch polling to be bio based

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

- the cookie construction can made entirely private in blk-mq.c
- the caller does not need to remember the request_queue and cookie
separately and thus sidesteps their lifetime issues
- keeping the device and the cookie inside the bio allows to trivially
support polling BIOs remapping by stacking drivers
- a lot of code to propagate the cookie back up the submission path can
be removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 90f7d7a0 10-Sep-2021 Jeff Layton <jlayton@kernel.org>

locks: remove LOCK_MAND flock lock support

As best I can tell, the logic for these has been broken for a long time
(at least before the move to git), such that they never conflict with
anything. Also, nothing checks for these flags and prevented opens or
read/write behavior on the files. They don't seem to do anything.

Given that, we can rip these symbols out of the kernel, and just make
flock(2) return 0 when LOCK_MAND is set in order to preserve existing
behavior.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jeff Layton <jlayton@kernel.org>


# f7e33bdb 19-Aug-2021 Jeff Layton <jlayton@kernel.org>

fs: remove mandatory file locking support

We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton <jlayton@kernel.org>


# 7a607a41 17-Jun-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up gfs2_unstuff_dinode

Split __gfs2_unstuff_inode off from gfs2_unstuff_dinode and clean up the
code a little. All remaining callers now pass NULL as the page argument
of gfs2_unstuff_dinode, so remove that argument.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 64090cbe 17-Jun-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Unstuff before locking page in gfs2_page_mkwrite

In gfs2_page_mkwrite, unstuff inodes before locking the page. That
way, we won't have to pass in the locked page to gfs2_unstuff_inode,
and gfs2_unstuff_inode can look up and lock the page itself.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0fc3bcd6 13-May-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up the error handling in gfs2_page_mkwrite

We're setting an error number so that block_page_mkwrite_return
translates it into the corresponding VM_FAULT_* code in several places,
but this is getting confusing, so set the VM_FAULT_* codes directly
instead. (No change in functionality.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d3c51c55 21-Jun-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix underflow in gfs2_page_mkwrite

On filesystems with a block size smaller than PAGE_SIZE and non-empty
files smaller then PAGE_SIZE, gfs2_page_mkwrite could end up allocating
excess blocks beyond the end of the file, similar to fallocate. This
doesn't make sense; fix it.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Fixes: 184b4e60853d ("gfs2: Fix end-of-file handling in gfs2_page_mkwrite")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0f1616f6 20-May-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix do_gfs2_set_flags description

Commit 88b631cbfbeb ("gfs2: convert to fileattr") changed the argument list
without updating the description.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d5b81454 01-Jun-2021 Andreas Gruenbacher <agruenba@redhat.com>

Revert "gfs2: Fix mmap locking for write faults"

This reverts commit b7f55d928e75557295c1ac280c291b738905b6fb.

As explained by Linus in [*], write faults on a mmap region are reads
from a filesysten point of view, so taking the inode glock exclusively
on write faults is incorrect.

Instead, when a page is marked writable, the .page_mkwrite vm operation
will be called, which is where the exclusive lock taking needs to
happen. I got this wrong because of a broken test case that made me
believe .page_mkwrite isn't getting called when it actually is.

[*] https://lore.kernel.org/lkml/CAHk-=wj8EWr_D65i4oRSj2FTbrc6RdNydNNCGxeabRnwtoU=3Q@mail.gmail.com/

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b7f55d92 15-May-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix mmap locking for write faults

When a write fault occurs, we need to take the inode glock of the underlying
inode in exclusive mode. Otherwise, there's no guarantee that the dirty page
will be written back to disk.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 43a511c4 09-May-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Prevent direct-I/O write fallback errors from getting lost

When a direct I/O write falls entirely and falls back to buffered I/O and the
buffered I/O fails, the write failed with return value 0 instead of the error
number reported by the buffered I/O. Fix that.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c551f66c 30-Mar-2021 Lee Jones <lee.jones@linaro.org>

gfs2: Fix a number of kernel-doc warnings

Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions. Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.

Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 54992257 27-Mar-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add new gfs2_iomap_get helper

Rename the current gfs2_iomap_get and gfs2_iomap_alloc functions to __*.
Add a new gfs2_iomap_get helper that doesn't expose struct metapath.
Rename gfs2_iomap_get_alloc to gfs2_iomap_alloc. Use the new helpers
where they make sense.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 88b631cb 07-Apr-2021 Miklos Szeredi <mszeredi@redhat.com>

gfs2: convert to fileattr

Use the fileattr API to let the VFS handle locking, permission checking and
conversion.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>


# 725d0e9d 02-Oct-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add per-reservation reserved block accounting

Add a rs_reserved field to struct gfs2_blkreserv to keep track of the number of
blocks reserved by this particular reservation, and a rd_reserved field to
struct gfs2_rgrpd to keep track of the total number of reserved blocks in the
resource group. Those blocks are exclusively reserved, as opposed to the
rs_requested / rd_requested blocks which are tracked in the reservation tree
(rd_rstree) and which can be stolen if necessary.

When making a reservation with gfs2_inplace_reserve, rs_reserved is set to
somewhere between ap->min_target and ap->target depending on the number of free
blocks in the resource group. When allocating blocks with gfs2_alloc_blocks,
rs_reserved is decremented accordingly. Eventually, any reserved but not
consumed blocks are returned to the resource group by gfs2_inplace_release.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0ec9b9ea 21-Oct-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Check for active reservation in gfs2_release

In gfs2_release, check if the inode has an active reservation to avoid
unnecessary lock taking.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 549c7297 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

fs: make helpers idmap mount aware

Extend some inode methods with an additional user namespace argument. A
filesystem that is aware of idmapped mounts will receive the user
namespace the mount has been marked with. This can be used for
additional permission checking and also to enable filesystems to
translate between uids and gids if they need to. We have implemented all
relevant helpers in earlier patches.

As requested we simply extend the exisiting inode method instead of
introducing new ones. This is a little more code churn but it's mostly
mechanical and doesnt't leave us with additional inode methods.

Link: https://lore.kernel.org/r/20210121131959.646623-25-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>


# 21cb47be 21-Jan-2021 Christian Brauner <christian.brauner@ubuntu.com>

inode: make init and permission helpers idmapped mount aware

The inode_owner_or_capable() helper determines whether the caller is the
owner of the inode or is capable with respect to that inode. Allow it to
handle idmapped mounts. If the inode is accessed through an idmapped
mount it according to the mount's user namespace. Afterwards the checks
are identical to non-idmapped mounts. If the initial user namespace is
passed nothing changes so non-idmapped mounts will see identical
behavior as before.

Similarly, allow the inode_init_owner() helper to handle idmapped
mounts. It initializes a new inode on idmapped mounts by mapping the
fsuid and fsgid of the caller from the mount's user namespace. If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-7-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>


# 3aac630b 12-Jan-2021 Eric Biggers <ebiggers@google.com>

gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()

The I_DIRTY_TIME flag is primary used within the VFS, and there's no
reason for ->fsync() implementations to do anything with it. This is
because when !datasync, the VFS will expire dirty timestamps before
calling ->fsync(). (See vfs_fsync_range().) This turns I_DIRTY_TIME
into I_DIRTY_SYNC.

Therefore, change gfs2_fsync() to not check for I_DIRTY_TIME.

Link: https://lore.kernel.org/r/20210112190253.64307-11-ebiggers@kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 2f632965 23-Jan-2021 Christoph Hellwig <hch@lst.de>

iomap: pass a flags argument to iomap_dio_rw

Pass a set of flags to iomap_dio_rw instead of the boolean
wait_for_completion argument. The IOMAP_DIO_FORCE_WAIT flag
replaces the wait_for_completion, but only needs to be passed
when the iocb isn't synchronous to start with to simplify the
callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
[djwong: rework xfs_file.c so that we can push iomap changes separately]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 60263d58 23-Jul-2020 Christoph Hellwig <hch@lst.de>

iomap: fall back to buffered writes for invalidation failures

Failing to invalid the page cache means data in incoherent, which is
a very bad state for the system. Always fall back to buffered I/O
through the page cache if we can't invalidate mappings.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>


# 4c5c3010 29-Jun-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Pass glock holder to gfs2_file_direct_{read,write}

Pass a pointer to the existing glock holder from
gfs2_file_{read,write}_iter to gfs2_file_direct_{read,write}
to save some stack space.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 20f82999 01-Jul-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rework read and page fault locking

So far, gfs2 has taken the inode glocks inside the ->readpage and
->readahead address space operations. Since commit d4388340ae0b ("fs:
convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed
the pages to read ahead locked. With that, the current holder of the
inode glock may be trying to lock one of those pages while
gfs2_readahead is trying to take the inode glock, resulting in a
deadlock.

Fix that by moving the lock taking to the higher-level ->read_iter file
and ->fault vm operations. This also gets rid of an ugly lock inversion
workaround in gfs2_readpage.

The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks. If a page is not cached,
filesystem locks must be taken before populating the page cache.

To avoid taking the inode glock when the data is already cached,
gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag
set. If that fails, the inode glock is taken and the operation is
retried with the IOCB_NOIO flag cleared.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4bd684bc 06-Mar-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove unnecessary gfs2_qa_{get,put} pairs

We now get the quota data structure when opening a file writable and put it
when closing that writable file descriptor, so there no longer is a need for
gfs2_qa_{get,put} while we're holding a writable file descriptor.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1595548f 06-Mar-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put

Keeping reservations and quotas separate helps reviewing the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 2fba46a0 26-Feb-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Change inode qa_data to allow multiple users

Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.

This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# d580712a 06-Mar-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc

Before this patch, multiple callers called gfs2_rsqa_alloc to force
the existence of a reservations structure and a quota data structure
if needed. However, now the reservations are handled separately, so
the quota data is only the quota data. So we eliminate the one in
favor of just calling gfs2_qa_alloc directly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 6e5e41e2 14-Jan-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: fix O_SYNC write handling

In gfs2_file_write_iter, for direct writes, the error checking in the buffered
write fallback case is incomplete. This can cause inode write errors to go
undetected. Fix and clean up gfs2_file_write_iter along the way.

Based on a proposed fix by Christoph Hellwig <hch@lst.de>.

Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4c0e8dda 15-Jan-2020 Christoph Hellwig <hch@lst.de>

gfs2: move setting current->backing_dev_info

Set current->backing_dev_info just around the buffered write calls to
prepare for the next fix.

Fixes: 967bcc91b044 ("gfs2: iomap direct I/O support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# eb43e660 14-Nov-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Introduce function gfs2_withdrawn

Add function gfs2_withdrawn and replace all checks for the SDF_WITHDRAWN
bit to call it. This does not change the logic or function of gfs2, and
it facilitates later improvements to the withdraw sequence.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 184b4e60 06-Nov-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix end-of-file handling in gfs2_page_mkwrite

When the filesystem block size is smaller than the page size, the last
page may contain blocks that lie entirely beyond the end of the file.
Make sure to only allocate blocks that lie at least partially in the
file. Allocating blocks beyond that isn't useful, and what's more, they
will not be zeroed out and may end up containing random data.

With that change in place, make sure we'll still always unstuff stuffed
inodes: iomap_writepage and iomap_writepages currently can't handle
stuffed files.

In addition, simplify and move the end-of-file check further to the top
in gfs2_page_mkwrite to avoid weird side effects like unstuffing when
we're not.

Fixes xfstest generic/263.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f53056c4 07-Nov-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Multi-block allocations in gfs2_page_mkwrite

In gfs2_page_mkwrite's gfs2_allocate_page_backing helper, try to
allocate as many blocks at once as we need. Pass in the size of the
requested allocation.

Fixes: 35af80aef99b ("gfs2: don't use buffer_heads in gfs2_allocate_page_backing")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f3b64b57 31-Aug-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Some whitespace cleanups

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 8d098070 03-Jun-2019 Arnd Bergmann <arnd@arndb.de>

gfs2: add compat_ioctl support

Out of the four ioctl commands supported on gfs2, only FITRIM
works in compat mode.

Add a proper handler based on the ext4 implementation.

Fixes: 6ddc5c3ddf25 ("gfs2: getlabel support")
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>


# 13ef9544 15-Oct-2019 Jan Kara <jack@suse.cz>

iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()

Filesystems do not support doing IO as asynchronous in some cases. For
example in case of unaligned writes or in case file size needs to be
extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO
in such cases, add argument to iomap_dio_rw() which makes the function
wait for IO completion. This also results in executing
iomap_dio_complete() inline in iomap_dio_rw() providing its return value
to the caller as for ordinary sync IO.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 45eb0504 02-Sep-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor PAGE_SIZE arithmetic cleanups

Replace divisions by PAGE_SIZE with shifts by PAGE_SHIFT and similar.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0a6a4abc 08-Aug-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Always mark inode dirty in fallocate

When allocating space with fallocate, always update the file timestamps
and mark the inode dirty, no matter if the FALLOC_FL_KEEP_SIZE flag is
set or not. The inode needs to be marked dirty so that a subsequent
fsync will pick it up and any new allocations will make it to disk.
Filesystems like xfs and ext4 always update the timestamps, so make
gfs2 behave the same way.

Fixes xfstest generic/483.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 35af80ae 01-Jul-2019 Christoph Hellwig <hch@lst.de>

gfs2: don't use buffer_heads in gfs2_allocate_page_backing

Rewrite gfs2_allocate_page_backing to call gfs2_iomap_get_alloc and operate on
struct iomap directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f3915f83 01-Jul-2019 Christoph Hellwig <hch@lst.de>

gfs2: use page_offset in gfs2_page_mkwrite

Without casting page->index to a guaranteed 64-bit type, the value might be
treated as 32-bit on 32-bit platforms and thus get truncated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 5aca2842 01-Jul-2019 Darrick J. Wong <darrick.wong@oracle.com>

vfs: create a generic checking and prep function for FS_IOC_SETFLAGS

Create a generic function to check incoming FS_IOC_SETFLAGS flag values
and later prepare the inode for updates so that we can standardize the
implementations that follow ext4's flag values.

Note that the efivarfs implementation no longer fails a no-op SETFLAGS
without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 04aea0ca 07-May-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN

Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7336d0e6 31-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 81214bab 04-Dec-2018 Christoph Hellwig <hch@lst.de>

iomap: wire up the iopoll method

Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device. Also refactor the common direct I/O bio submission code into a
nice little helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Modified to use bio_set_polled().

Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4d62d3f7 29-Nov-2018 NeilBrown <neilb@suse.com>

gfs2: properly initial file_lock used for unlock.

Rather than assuming all-zeros is sufficient, use the available API to
initialize the file_lock structure use for unlock. VFS-level changes
will soon make it important that the list_heads in file_lock are
always properly initialized.

Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>


# 21f09c43 30-Aug-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Move rs_{sizehint, rgd_gh} fields into the inode

Move the rs_sizehint and rs_rgd_gh fields from struct gfs2_blkreserv
into the inode: they are more closely related to the inode than to a
particular reservation.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>


# 6ddc5c3d 08-Oct-2018 Steve Whitehouse <swhiteho@redhat.com>

gfs2: getlabel support

Add support for the GETFSLABEL ioctl in gfs2.
I tested this patch and it works as expected.

Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>
Tested-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1d45bb7f 26-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Use iomap for stuffed direct I/O reads

Remove the fallback code from direct to buffered I/O for stuffed reads.

For stuffed writes, we must keep the fallback code: the deferred glock
we are holding under direct I/O doesn't allow to write to the inode or
change the file size.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# c2589282 06-Jul-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: fallocate_chunk: Always initialize struct iomap

In fallocate_chunk, always initialize the iomap before calling
gfs2_iomap_get_alloc: future changes could otherwise cause things like
iomap.flags to leak across calls.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 109dbb1e 02-Jul-2018 Souptick Joarder <jrdr.linux@gmail.com>

fs: gfs2: Adding new return type vm_fault_t

Use new return type vm_fault_t for gfs2_page_mkwrite
handler.

see commit 1c8f422059ae ("mm: change return type to
vm_fault_t") for reference.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 967bcc91 19-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: iomap direct I/O support

The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 64bc06bb 24-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: iomap buffered write support

With the traditional page-based writes, blocks are allocated separately
for each page written to. With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# d505a96a 24-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Further iomap cleanups

In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage. This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Further clean up gfs2_iomap_get and implement gfs2_stuffed_iomap here
directly. For reads beyond the end of the file, return holes instead of
failing with -ENOENT so that we can get rid of that special case in
gfs2_block_map.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 628e366d 04-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Iomap cleanups and improvements

Clean up gfs2_iomap_alloc and gfs2_iomap_get. Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.

Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.

In gfs2_block_map and fallocate_chunk, zero-initialize struct iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# fffb6412 29-Mar-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Zero out fallocated blocks in fallocate_chunk

Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
out in fallocate_chunk, much higher up the call stack. This gets rid of
gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
buffer flag. I can't think of a reason why zeroing out the blocks in
gfs2_iomap_alloc would have any benefits: there is no additional locking
at that level that would add protection to the newly allocated blocks.

While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>


# 174d1232 20-Feb-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix fallocate chunk size

The chunk size of allocations in __gfs2_fallocate is calculated
incorrectly. The size can collapse, causing __gfs2_fallocate to
allocate one block at a time, which is very inefficient. This needs
fixing in two places:

In gfs2_quota_lock_check, always set ap->allowed to UINT_MAX to indicate
that there is no quota limit. This fixes callers that rely on
ap->allowed to be set even when quotas are off.

In __gfs2_fallocate, reset max_blks to UINT_MAX in each iteration of the
loop to make sure that allocation limits from one resource group won't
spill over into another resource group.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 805c0907 08-Jan-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Log the reason for log flushes in every log header

This patch just adds the capability for GFS2 to track which function
called gfs2_log_flush. This should make it easier to diagnose
problems based on the sequence of events found in the journals.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# c1696fb8 16-Jan-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Introduce new gfs2_log_header_v2

This patch adds a new structure called gfs2_log_header_v2 which is used
to store expanded fields into previously unused areas of the log headers
(i.e., this change is backwards compatible). Some of these are used for
debug purposes so we can backtrack when problems occur. Others are
reserved for future expansion.

This patch is based on a prototype from Steve Whitehouse.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4e56a641 14-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE)

Implement the top-level bits of punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b16f7e57 09-Oct-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix and clean up {GET,SET}FLAGS ioctl

Switch to a simple array for mapping between the FS_*_FL and GFS_DIF_*
flags. Clarify how the mapping between FS_JOURNAL_DATA_FL and the
filesystem flags works. The GFS2_DIF_SYSTEM flag cannot be set from
user space, so remove it from GFS2_FLAGS_USER_SET. Fail with -EINVAL
when trying to set flags that are not supported instead of silently
ignoring those flags.

Partially fixes xfstest generic/424.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# cc555b09 20-Sep-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Take inode off order_write list when setting jdata flag

This patch fixes a deadlock caused when the jdata flag is set for
inodes that are already on the ordered write list. Since it is
on the ordered write list, log_flush calls gfs2_ordered_write which
calls filemap_fdatawrite. But since the inode had the jdata flag
set, that calls gfs2_jdata_writepages, which tries to start a new
transaction. A new transaction cannot be started because it tries
to acquire the log_flush rwsem which is already locked by the log
flush operation.

The bottom line is: We cannot switch an inode from ordered to jdata
until we eliminate any ordered data pages (via log flush) or any
log_flush operation afterward will create the circular dependency
above. So we need to flush the log before setting the diskflags to
switch the file mode, then we need to remove the inode from the
ordered writes list.

Before this patch, the log flush was done for jdata->ordered, but
that's wrong. If we're going from jdata to ordered, we don't need
to call gfs2_log_flush because the call to filemap_fdatawrite will
do it for us:

filemap_fdatawrite() -> __filemap_fdatawrite_range()
__filemap_fdatawrite_range() -> do_writepages()
do_writepages() -> gfs2_jdata_writepages()
gfs2_jdata_writepages() -> gfs2_log_flush()

This patch modifies function do_gfs2_set_flags so that if a file
has its jdata flag set, and it's already on the ordered write list,
the log will be flushed and it will be removed from the list
before setting the flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Abhijith Das <adas@redhat.com>


# 3a27411c 15-Mar-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Implement SEEK_HOLE / SEEK_DATA via iomap

So far, lseek on gfs2 did not report holes.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 9b7c2ddb 22-Sep-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Update ctime in setflags ioctl

The FS_IOC_SETFLAGS ioctl is supposed to update the inode ctime.
Fixes xfstests generic/277.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# d07a6ac7 07-Jul-2017 Jeff Layton <jlayton@kernel.org>

gfs2: convert to errseq_t based writeback error reporting for fsync

Also, fix a place where a writeback error might get dropped in the
gfs2_is_jdata case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>


# 283c9a97 17-Jul-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Lock holder cleanup (fixup)

Function gfs2_holder_initialized should be used in do_flock as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# d4d7fc12 05-Apr-2017 Andrew Price <anprice@redhat.com>

gfs2: Re-enable fallocate for the rindex

Commit 86066914edff2316cbed63aac8a87d5001441a16 "gfs2: Don't support
fallocate on jdata files" removed the ability of gfs2_grow to reserve
space at the end of the rindex, which could prevent a second gfs2_grow
from succeeding if the fs is full. Allow fallocate to work on the rindex
once again.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 11bac800 24-Feb-2017 Dave Jiang <dave.jiang@intel.com>

mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf

->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7c0f6ba6 24-Dec-2016 Linus Torvalds <torvalds@linux-foundation.org>

Replace <asm/uaccess.h> with <linux/uaccess.h> globally

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 82c156f8 22-Sep-2016 Al Viro <viro@zeniv.linux.org.uk>

switch generic_file_splice_read() to use of ->read_iter()

... and kill the ->splice_read() instances that can be switched to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d7c436cd 26-Sep-2016 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Update file times after grabbing glock

In gfs2_page_mkwrite, grab the inode glock in EX mode before calling
file_update_time: grabbing the lock may result in a call to
gfs2_dinode_in, which will reset the file times to their on-disk state.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 6df9f9a2 17-Jun-2016 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Lock holder cleanup

Make the code more readable by cleaning up the different ways of
initializing lock holders and checking for initialized lock holders:
mark lock holders as uninitialized by setting the holder's glock to NULL
(gfs2_holder_mark_uninitialized) instead of zeroing out the entire
object or using a separate flag. Recognize initialized holders by their
non-NULL glock (gfs2_holder_initialized). Don't zero out holder objects
which are immeditiately initialized via gfs2_holder_init or
gfs2_glock_nq_init.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1d1bb236 12-May-2016 Al Viro <viro@zeniv.linux.org.uk>

gfs2: switch to ->iterate_shared()

protected by glock and already used without locking the directory
by gfs2_get_name()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 80f4781d 02-May-2016 Abhi Das <adas@redhat.com>

gfs2: use inode_lock/unlock instead of accessing i_mutex directly

i_mutex has been replaced by i_rwsem and directly accessing the
non-existent i_mutex breaks the kernel build.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# dde0c2e7 07-Apr-2016 Christoph Hellwig <hch@lst.de>

fs: add IOCB_SYNC and IOCB_DSYNC

This will allow us to do per-I/O sync file writes, as required by a lot
of fileservers or storage targets.

XXX: Will need a few additional audits for O_DSYNC

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 9c7fe835 19-Apr-2016 Daniel DeFreez <dcdefreez@ucdavis.edu>

GFS2: Add calls to gfs2_holder_uninit in two error handlers

This patch fixes two locations that do not call gfs2_holder_uninit
if gfs2_glock_nq returns an error.

Signed-off-by: Daniel DeFreez <dcdefreez@ucdavis.edu>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 61152675 04-Apr-2016 Abhi Das <adas@redhat.com>

gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()

gfs2_file_splice_read() f_op grabs and releases the cluster-wide
inode glock to sync the inode size to the latest.

Without this, generic_file_splice_read() uses an older i_size value
and can return EOF for valid offsets in the inode.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 09cbfeaf 01-Apr-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5955102c 22-Jan-2016 Al Viro <viro@zeniv.linux.org.uk>

wrappers for ->i_mutex access

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a93a9983 22-Dec-2015 Junxiao Bi <junxiao.bi@oracle.com>

gfs2: fix flock panic issue

Commit 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
moved flock/posix lock identify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which will cause kernel panic in
locks_lock_inode_wait().

Fixes: 4f6563677ae8 ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b58bf407 24-Jul-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Reduce size of incore inode

This patch makes no functional changes. Its goal is to reduce the
size of the gfs2 inode in memory by rearranging structures and
changing the size of some variables within the structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# a097dc7e 16-Jul-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Make rgrp reservations part of the gfs2_inode structure

Before this patch, multi-block reservation structures were allocated
from a special slab. This patch folds the structure into the gfs2_inode
structure. The disadvantage is that the gfs2_inode needs more memory,
even when a file is opened read-only. The advantages are: (a) we don't
need the special slab and the extra time it takes to allocate and
deallocate from it. (b) we no longer need to worry that the structure
exists for things like quota management. (c) This also allows us to
remove the calls to get_write_access and put_write_access since we
know the structure will exist.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b54e9a0b 26-Oct-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Extract quota data from reservations structure (revert 5407e24)

This patch basically reverts the majority of patch 5407e24.
That patch eliminated the gfs2_qadata structure in favor of just
using the reservations structure. The problem with doing that is that
it increases the size of the reservations structure. That is not an
issue until it comes time to fold the reservations structure into the
inode in memory so we know it's always there. By separating out the
quota structure again, we aren't punishing the non-quota users by
making all the inodes bigger, requiring more slab space. This patch
creates a new slab area to allocate the quota stuff so it's managed
a little more sanely.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# acc546fd 10-Nov-2015 Abhi Das <adas@redhat.com>

gfs2: Automatically set GFS2_DIF_SYSTEM flag on system files

When new files and directories are created inside a parent directory
we automatically inherit the GFS2_DIF_SYSTEM flag (if set) and assign
it to the new file/dirs.

All new system files/dirs created in the metafs by, say gfs2_jadd,
will have this flag set because they will have parent directories in
the metafs whose GFS2_DIF_SYSTEM flag has already been set (most likely
by a previous mkfs.gfs2)

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 4f656367 22-Oct-2015 Benjamin Coddington <bcodding@redhat.com>

Move locks API users to locks_lock_inode_wait()

Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait(). This
allows for some later cleanup.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>


# 4b813f09 22-Sep-2015 Andrew Price <anprice@redhat.com>

GFS2: fallocate: do not rely on file_update_time to mark the inode dirty

Previously __gfs2_fallocate() relied on file_update_time() marking the
inode dirty, but that's not a safe assumption as that function doesn't
dirty the inode in some cases. Mark the inode dirty explicitly.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 86066914 09-Jun-2015 Abhi Das <adas@redhat.com>

gfs2: Don't support fallocate on jdata files

We cannot provide an efficient implementation due to the headers
on the data blocks, so there doesn't seem much point in having it.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 01e64ee4 04-May-2015 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: make sure S_NOSEC flag isn't overwritten

At the end of gfs2_set_inode_flags inode->i_flags is set to flags, so
we should be modifying flags instead of inode->i_flags, so it isn't
overwritten.

Signed-off-by: Benjamin Marzinski <bmarzins redhat com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 2ba48ce5 09-Apr-2015 Al Viro <viro@zeniv.linux.org.uk>

mirror O_APPEND and O_DIRECT into iocb->ki_flags

... avoiding write_iter/fcntl races.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5d5d5689 03-Apr-2015 Al Viro <viro@zeniv.linux.org.uk>

make new_sync_{read,write}() static

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# e2e40f2c 22-Feb-2015 Christoph Hellwig <hch@lst.de>

fs: move struct kiocb to fs.h

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d9be0cda 17-Mar-2015 Abhi Das <adas@redhat.com>

gfs2: allow fallocate to max out quotas/fs efficiently

We can quickly get an estimate of how many blocks are available
for allocation restricted by quota and fs size respectively, using
the ap->allowed field in the gfs2_alloc_parms structure.
gfs2_quota_check() and gfs2_inplace_reserve() provide these values.

Once we have the total number of blocks available to us, we can
compute how many bytes of data can be written using those blocks
instead of guessing inefficiently.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# b8fbf471 17-Mar-2015 Abhi Das <adas@redhat.com>

gfs2: perform quota checks against allocation parameters

Use struct gfs2_alloc_parms as an argument to gfs2_quota_check()
and gfs2_quota_lock_check() to check for quota violations while
accounting for the new blocks requested by the current operation
in ap->target.

Previously, the number of new blocks requested during an operation
were not accounted for during quota_check and would allow these
operations to exceed quota. This was not very apparent since most
operations allocated only 1 block at a time and quotas would get
violated in the next operation. i.e. quota excess would only be by
1 block or so. With fallocate, (where we allocate a bunch of blocks
at once) the quota excess is non-trivial and is addressed by this
patch.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# f1ea6f4e 24-Feb-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Move gfs2_file_splice_write outside of #ifdef

This patch moves function gfs2_file_splice_write so it's not
conditionally compiled.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# f42a69fa 19-Feb-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Allocate reservation during splice_write

This patch adds a GFS2-specific function for splice_write which
first calls function gfs2_rs_alloc to make sure a reservation
structure has been allocated before attempting to reserve blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# d83a08db 10-Feb-2015 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0ae45f63 01-Feb-2015 Theodore Ts'o <tytso@mit.edu>

vfs: add support for a lazytime mount option

Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.

This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.

For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).

Google-Bug-Id: 18297052

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 98f1a696 12-Nov-2014 Andrew Price <anprice@redhat.com>

GFS2: Update timestamps on fallocate

gfs2_fallocate() wasn't updating ctime and mtime when modifying the
inode. Add a call to file_update_time() to do that.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1885867b 12-Nov-2014 Andrew Price <anprice@redhat.com>

GFS2: Update i_size properly on fallocate

This addresses an issue caught by fsx where the inode size was not being
updated to the expected value after fallocate(2) with mode 0.

The problem was caused by the offset and len parameters being converted
to multiples of the file system's block size, so i_size would be rounded
up to the nearest block size multiple instead of the requested size.

This replaces the per-chunk i_size updates with a single i_size_write on
successful completion of the operation. With this patch gfs2 gets
through a complete run of fsx.

For clarity, the check for (error == 0) following the loop is removed as
all failures before that point jump to out_* labels or return.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9c9f1159 12-Nov-2014 Andrew Price <anprice@redhat.com>

GFS2: Use inode_newsize_ok and get_write_access in fallocate

gfs2_fallocate wasn't checking inode_newsize_ok nor get_write_access.
Split out the context setup and inode locking pieces into a separate
function to make it more clear and add these missing calls.

inode_newsize_ok is called conditional on FALLOC_FL_KEEP_SIZE as there
is no need to enforce a file size limit if it isn't going to change.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 33ad5d54 29-Oct-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Only increase rs_sizehint

If an application does a sequence of (1) big write, (2) little write
we don't necessarily want to reset the size hint based on the smaller
size. The fact that they did any big writes implies they may do more,
and therefore we should try to allocate bigger block reservations, even
if the last few were small writes. Therefore this patch changes function
gfs2_size_hint so that the size hint can only grow; it cannot shrink.
This is especially important where there are multiple writers.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1c994a09 27-Aug-2014 Jeff Layton <jlayton@kernel.org>

locks: consolidate "nolease" routines

GFS2 and NFS have setlease routines that always just return -EINVAL.
Turn that into a generic routine that can live in fs/libfs.c.

Cc: <linux-nfs@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: <cluster-devel@redhat.com>
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 2ddfbdd6 19-Aug-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Request demote when a "try" flock fails

This patch changes the flock code so that it uses the TRY_1CB flag
instead of the TRY flag on the first attempt. That forces any holding
nodes to issue a dlm callback, which requests a demote of the glock.
Then, if the "try" failed, it sleeps a small amount of time for the
demote to occur. Then it tries again, for an increasing amount of time.
Subsequent attempts to gain the "try" lock don't use "_1CB" so that
only one callback is issued.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 97a4f1d7 26-Jun-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Allow caching of glocks for flock

This patch removes the GLF_NOCACHE flag from the glocks associated with
flocks. There should be no good reason not to cache glocks for flocks:
they only force the glock to be demoted before they can be reacquired,
which can slow down performance and even cause glock hangs, especially
in cases where the flocks are held in Shared (SH) mode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5bef3e7c 26-Jun-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Allow flocks to use normal glock dq rather than dq_wait

This patch allows flock glocks to use a non-blocking dequeue rather
than dq_wait. It also reverts the previous patch I had posted regarding
dq_wait. The reverted patch isn't necessarily a bad idea, but I decided
this might avoid unforeseen side effects, and was therefore safer.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8d020765 05-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

->splice_write() via ->write_iter()

iter_file_splice_write() - a ->splice_write() instance that gathers the
pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
it to ->write_iter(). A bunch of simple cases coverted to that...

[AV: fixed the braino spotted by Cyrill]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 9dd868e1 15-May-2014 Fabian Frederick <fabf@skynet.be>

GFS2: fs/gfs2/file.c: kernel-doc warning fixes

Related function is not gfs2_set_flags but do_gfs2_set_flags

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 24972557 01-May-2014 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: remove transaction glock

GFS2 has a transaction glock, which must be grabbed for every
transaction, whose purpose is to deal with freezing the filesystem.
Aside from this involving a large amount of locking, it is very easy to
make the current fsfreeze code hang on unfreezing.

This patch rewrites how gfs2 handles freezing the filesystem. The
transaction glock is removed. In it's place is a freeze glock, which is
cached (but not held) in a shared state by every node in the cluster
when the filesystem is mounted. This lock only needs to be grabbed on
freezing, and actions which need to be safe from freezing, like
recovery.

When a node wants to freeze the filesystem, it grabs this glock
exclusively. When the freeze glock state changes on the nodes (either
from shared to unlocked, or shared to exclusive), the filesystem does a
special log flush. gfs2_log_flush() does all the work for flushing out
the and shutting down the incore log, and then it tries to grab the
freeze glock in a shared state again. Since the filesystem is stuck in
gfs2_log_flush, no new transaction can start, and nothing can be written
to disk. Unfreezing the filesytem simply involes dropping the freeze
glock, allowing gfs2_log_flush() to grab and then release the shared
lock, so it is cached for next time.

However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
shared lock on the filesystem root directory inode to check permissions.
If that glock has already been grabbed exclusively, fsfreeze will be
unable to get the shared lock and unfreeze the filesystem.

In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
on the filesystem root directory during the freeze, and hold it until it
unfreezes the filesystem. The functions which need to grab a shared
lock in order to allow the unfreeze ioctl to be issued now use the lock
grabbed by the freeze code instead.

The freeze and unfreeze code take care to make sure that this shared
lock will not be dropped while another process is using it.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# da56e45b 03-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

gfs2: switch to ->write_iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# aad4f8bb 02-Apr-2014 Al Viro <viro@zeniv.linux.org.uk>

switch simple generic_file_aio_read() users to ->read_iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# f1820361 07-Apr-2014 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm: implement ->map_pages for page cache

filemap_map_pages() is generic implementation of ->map_pages() for
filesystems who uses page cache.

It should be safe to use filemap_map_pages() for ->map_pages() if
filesystem use filemap_fault() for ->fault().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ning Qu <quning@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a0846a53 06-Feb-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Lock i_mutex and use a local gfs2_holder for fallocate

This patch causes GFS2 to lock the i_mutex during fallocate. It
also switches from using a dinode's inode glock to using a local
holder like the other GFS2 i_operations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7b9cff46 02-Oct-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add allocation parameters structure

This patch adds a structure to contain allocation parameters with
the intention of future expansion of this structure. The idea is
that we should be able to add more information about the allocation
in the future in order to allow the allocator to make a better job
of placing the requests on-disk.

There is no functional difference from applying this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# af5c2697 26-Sep-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up reservation removal

The reservation for an inode should be cleared when it is truncated so
that we can start again at a different offset for future allocations.
We could try and do better than that, by resetting the search based on
where the truncation started from, but this is only a first step.

In addition, there are three callers of gfs2_rs_delete() but only one
of those should really be testing the value of i_writecount. While
we get away with that in the other cases currently, I think it would
be better if we made that test specific to the one case which
requires it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0c901809 03-Sep-2013 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: dirty inode correctly in gfs2_write_end

GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when
it actually increased the file size. If gfs2_fsync was called without
I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before
returning, so any metadata or journaled data changes were not getting
fsynced. This meant that writes to the middle of files were not always
getting fsynced properly.

This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been
updated during a write. It also make gfs2_sync flush the incore log
if I_DIRTY_PAGES is set, and the file is using data journalling. This
will make sure that all incore logged data gets written to disk before
returning from a fsync.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1c8c601a 21-Jun-2013 Jeff Layton <jlayton@kernel.org>

locks: protect most of the file_lock handling with i_lock

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d81a8ef5 16-May-2013 Al Viro <viro@zeniv.linux.org.uk>

[readdir] convert gfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 6d4ade98 14-Jun-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add atomic_open support

I've restricted atomic_open to only operate on regular files, although
I still don't understand why atomic_open should not be possible also for
directories on GFS2. That can always be added in later though, if it
makes sense.

The ->atomic_open function can be passed negative dentries, which
in most cases means either ENOENT (->lookup) or a call to d_instantiate
(->create). In the GFS2 case though, we need to actually perform the
look up, since we do not know whether there has been a new inode created
on another node. The look up calls d_splice_alias which then tries to
rehash the dentry - so the solution here is to simply check for that
in d_splice_alias. The same issue is likely to affect any other cluster
filesystem implementing ->atomic_open

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "J. Bruce Fields" <bfields fieldses org>
Cc: Jeff Layton <jlayton@redhat.com>


# 2b3dcf35 28-May-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Increase i_writecount during gfs2_setattr_size

This patch calls get_write_access in a few functions. This
merely increases inode->i_writecount for the duration of the function.
That will ensure that any file closes won't delete the inode's
multi-block reservation while the function is running.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a27bb332 07-May-2013 Kent Overstreet <koverstreet@google.com>

aio: don't include aio.h in sched.h

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c2952d20 14-Mar-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix unlock of fcntl locks during withdrawn state

When withdraw occurs, we need to continue to allow unlocks of fcntl
locks to occur, however these will only be local, since the node has
withdrawn from the cluster. This prevents triggering a VFS level
bug trap due to locks remaining when a file is closed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 496ad9aa 23-Jan-2013 Al Viro <viro@zeniv.linux.org.uk>

new helper: file_inode(file)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1d1d1a76 21-Feb-2013 Darrick J. Wong <darrick.wong@oracle.com>

mm: only enforce stable page writes if the backing device requires it

Create a helper function to check if a backing device requires stable
page writes and, if so, performs the necessary wait. Then, make it so
that all points in the memory manager that handle making pages writable
use the helper function. This should provide stable page write support
to most filesystems, while eliminating unnecessary waiting for devices
that don't require the feature.

Before this patchset, all filesystems would block, regardless of whether
or not it was necessary. ext3 would wait, but still generate occasional
checksum errors. The network filesystems were left to do their own
thing, so they'd wait too.

After this patchset, all the disk filesystems except ext3 and btrfs will
wait only if the hardware requires it. ext3 (if necessary) snapshots
pages instead of blocking, and btrfs provides its own bdi so the mm will
never wait. Network filesystems haven't been touched, so either they
provide their own stable page guarantees or they don't block at all.
The blocking behavior is back to what it was before 3.0 if you don't
have a disk requiring stable page writes.

Here's the result of using dbench to test latency on ext2:

3.8.0-rc3:
Operation Count AvgLat MaxLat
----------------------------------------
WriteX 109347 0.028 59.817
ReadX 347180 0.004 3.391
Flush 15514 29.828 287.283

Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms

3.8.0-rc3 + patches:
WriteX 105556 0.029 4.273
ReadX 335004 0.005 4.112
Flush 14982 30.540 298.634

Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms

As you can see, the maximum write latency drops considerably with this
patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave
similarly, but see the cover letter for those results.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 350a9b0a 13-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Split gfs2_trans_add_bh() into two

There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 965c8e59 17-Dec-2012 Andrew Morton <akpm@linux-foundation.org>

lseek: the "whence" argument is called "whence"

But the kernel decided to call it "origin" instead. Fix most of the
sites.

Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9dbe9610 31-Oct-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add Orlov allocator

Just like ext3, this works on the root directory and any directory
with the +T flag set. Also, just like ext3, any subdirectory created
in one of the just mentioned cases will be allocated to a random
resource group (GFS2 equivalent of a block group).

If you are creating a set of directories, each of which will contain a
job running on a different node, then by setting +T on the parent
directory before creating the subdirectories, each will land up in a
different resource group, and thus resource group contention between
nodes will be kept to a minimum.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3d162688 05-Nov-2012 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: Don't call file_accessed() with a shared glock

file_accessed() was being called by gfs2_mmap() with a shared glock. If it
needed to update the atime, it was crashing because it dirtied the inode in
gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
checked if the caller was already holding a glock, but it didn't make sure that
the glock was in the exclusive state. Now, instead of calling file_accessed()
while holding the shared lock in gfs2_mmap(), file_accessed() is called after
grabbing and releasing the glock to update the inode. If file_accessed() needs
to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().

gfs2_dirty_inode() now also checks to make sure that if the calling process has
already locked the glock, it has an exclusive lock.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 73738a77 12-Oct-2012 Andrew Price <anprice@redhat.com>

GFS2: Clean up some unused assignments

Cleans up two cases where variables were assigned values but then never
used again.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0b173bc4 08-Oct-2012 Konstantin Khlebnikov <khlebnikov@openvz.org>

mm: kill vma flag VM_CAN_NONLINEAR

Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 71f890f7 30-Jul-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove rs_requested field from reservations

The rs_requested field is left over from the original allocation
code, however this should have been a parameter passed to the
various functions from gfs2_inplace_reserve() and not a member of the
reservation structure as the value is not required after the
initial allocation.

This also helps simplify the code since we no longer need to set
the rs_requested to zero. Also the gfs2_inplace_release()
function can also be simplified since the reservation structure
will always be defined when it is called, and the only remaining
task is to unlock the rgrp if required. It can also now be
called unconditionally too, resulting in a further simplification.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# da1dfb6a 26-Jul-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Make write size hinting code common

This collects up the write size hinting code which is used by the
block reservation subsystem into a single function. At the same
time this also corrects the rounding for this calculation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 39263d5e 12-Jun-2012 Jan Kara <jack@suse.cz>

gfs2: Convert to new freezing mechanism

We update gfs2_page_mkwrite() to use new freeze protection and the transaction
code to use freeze protection while the transaction is running. That is needed
to stop iput() of unlinked file from modifying the filesystem. The rest is
handled by the generic code.

CC: cluster-devel@redhat.com
CC: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a63e9b2e 12-Jun-2012 Jan Kara <jack@suse.cz>

gfs2: Push file_update_time() into gfs2_page_mkwrite()

CC: Steven Whitehouse <swhiteho@redhat.com>
CC: cluster-devel@redhat.com
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 15e1c960 20-Jul-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Eliminate 64-bit divides

This patch removes the 64-bit divides introduced in the previous patch
in favor of shifting, so that it will compile properly on 32-bit machines.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8e2e0047 19-Jul-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Reduce file fragmentation

This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
resulting improved on disk layout greatly speeds up operations in cases
which would have resulted in interlaced allocation of blocks previously.
A typical example of this is 10 parallel dd processes, each writing to a
file in a common dirctory.

The implementation uses an rbtree of reservations attached to each
resource group (and each inode).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 23d0bb83 28-May-2012 Steven Whitehouse <steve@chygwyn.com>

GFS2: Add "top dir" flag support

This patch adds support for the "top dir" flag. Currently this is unused
but a subsequent patch is planned which will add support for the
Orlov allocation policy when allocating subdirectories in a parent
with this flag set.

In order to ensure backward compatible behaviour, mkfs.gfs2 does
not currently tag the root directory with this flag, it must always be
set manually.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5407e242 18-May-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Fold quota data into the reservations struct

This patch moves the ancillary quota data structures into the
block reservations structure. This saves GFS2 some time and
effort in allocating and deallocating the qadata structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0a305e49 06-Jun-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Extend the life of the reservations

This patch lengthens the lifespan of the reservations structure for
inodes. Before, they were allocated and deallocated for every write
operation. With this patch, they are allocated when the first write
occurs, and deallocated when the last process closes the file.
It's more efficient to do it this way because it saves GFS2 a lot of
unnecessary allocates and frees. It also gives us more flexibility
for the future: (1) we can now fold the qadata structure back into
the structure and save those alloc/frees, (2) we can use this for
multi-block reservations.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# df3fd117 10-Apr-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Rename function gfs2_close to gfs2_release

This patch renames function gfs2_close to gfs2_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2f99c369 23-Mar-2012 Al Viro <viro@zeniv.linux.org.uk>

get rid of pointless includes of ext2_fs.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 58a7d5fb 08-Mar-2012 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: call gfs2_write_alloc_required for each chunk

gfs2_fallocate was calling gfs2_write_alloc_required() once at the start of
the function. This caused problems since gfs2_write_alloc_required used a
long unsigned int for the len, but gfs2_fallocate could allocate a much
larger amount. This patch will move the call into the loop where the
chunks are actually allocated and zeroed out. This will keep the allocation
size under the limit, and also allow gfs2_fallocate to quickly skip over
sections of the file that are already completely allocated.

fallcate_chunk was also not correctly setting the file size. It was using the
len veriable to find the last block written to, but by the time it was setting
the size, the len variable had already been decremented to 0.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 66fc061b 07-Feb-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: FITRIM ioctl support

The FITRIM ioctl provides an alternative way to send discard requests to
the underlying device. Using the discard mount option results in every
freed block generating a discard request to the block device. This can
be slow, since many block devices can only process discard requests of
larger sizes, and also such operations can be time consuming.

Rather than using the discard mount option, FITRIM allows a sweep of the
filesystem on an occasional basis, and also to optionally avoid sending
down discard requests for smaller regions.

In GFS2 FITRIM will work at resource group granularity. There is a flag
for each resource group which keeps track of which resource groups have
been trimmed. This flag is reset whenever a deallocation occurs in the
resource group, and set whenever a successful FITRIM of that resource
group has taken place. This helps to reduce repeated discard requests
for the same block ranges, again improving performance.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a365fbf3 24-Feb-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Read resource groups on mount

This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9e73f571 17-Feb-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Ensure rindex is uptodate for fallocate

This patch fixes a problem whereby gfs2_grow was failing and causing GFS2
to assert. The problem was that when GFS2's fallocate operation tried to
acquire an "allocation" it made sure the rindex was up to date, and if not,
it called gfs2_rindex_update. However, if the file being fallocated was
the rindex itself, it was already locked at that point. By calling
gfs2_rindex_update at an earlier point in time, we bring rindex up to date
and thereby avoid trying to lock it when the "allocation" is acquired.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2a79f17e 09-Dec-2011 Al Viro <viro@zeniv.linux.org.uk>

vfs: mnt_drop_write_file()

new helper (wrapper around mnt_drop_write()) to be used in pair with
mnt_want_write_file().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# a561be71 23-Nov-2011 Al Viro <viro@zeniv.linux.org.uk>

switch a bunch of places to mnt_want_write_file()

it's both faster (in case when file has been opened for write) and cleaner.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 564e12b1 21-Nov-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: decouple quota allocations from block allocations

This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4442f2e0 21-Nov-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: O_(D)SYNC support for fallocate

Add sync of metadata after fallocate for O_SYNC files to ensure that we
meet expectations for everything being on disk in this case.
Unfortunately, the offset and len parameters are modified during the
course of the fallocate function, so I've had to add a couple of new
variables to call generic_write_sync() at the end.

I know that potentially this will sync data as well within the range,
but I think that is a fairly harmless side-effect overall, since we
would not normally expect there to be any dirty data within the range in
question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Benjamin Marzinski <bmarzins@redhat.com>


# 87654896 08-Nov-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: More automated code analysis fixes

A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dfe4d34b 26-Oct-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: Add readahead to sequential directory traversal

This patch adds read-ahead capability to GFS2's
directory hash table management. It greatly improves
performance for some directory operations. For example:
In one of my file systems that has 1000 directories, each
of which has 1000 files, time to execute a recursive
ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced
from 2m2.814s on a stock kernel to 0m45.938s.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ef3d0fd2 15-Sep-2011 Andi Kleen <ak@linux.intel.com>

vfs: do (nearly) lockless generic_file_llseek

The i_mutex lock use of generic _file_llseek hurts. Independent processes
accessing the same file synchronize over a single lock, even though
they have no need for synchronization at all.

Under high utilization this can cause llseek to scale very poorly on larger
systems.

This patch does some rethinking of the llseek locking model:

First the 64bit f_pos is not necessarily atomic without locks
on 32bit systems. This can already cause races with read() today.
This was discussed on linux-kernel in the past and deemed acceptable.
The patch does not change that.

Let's look at the different seek variants:

SEEK_SET: Doesn't really need any locking.
If there's a race one writer wins, the other loses.

For 32bit the non atomic update races against read()
stay the same. Without a lock they can also happen
against write() now. The read() race was deemed
acceptable in past discussions, and I think if it's
ok for read it's ok for write too.

=> Don't need a lock.

SEEK_END: This behaves like SEEK_SET plus it reads
the maximum size too. Reading the maximum size would have the
32bit atomic problem. But luckily we already have a way to read
the maximum size without locking (i_size_read), so we
can just use that instead.

Without i_mutex there is no synchronization with write() anymore,
however since the write() update is atomic on 64bit it just behaves
like another racy SEEK_SET. On non atomic 32bit it's the same
as SEEK_SET.

=> Don't need a lock, but need to use i_size_read()

SEEK_CUR: This has a read-modify-write race window
on the same file. One could argue that any application
doing unsynchronized seeks on the same file is already broken.
But for the sake of not adding a regression here I'm
using the file->f_lock to synchronize this. Using this
lock is much better than the inode mutex because it doesn't
synchronize between processes.

=> So still need a lock, but can use a f_lock.

This patch implements this new scheme in generic_file_llseek.
I dropped generic_file_llseek_unlocked and changed all callers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 64dd153c 12-Sep-2011 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: rewrite fallocate code to write blocks directly

GFS2's fallocate code currently goes through the page cache. Since it's only
writing to the end of the file or to holes in it, it doesn't need to, and it
was causing issues on low memory environments. This patch pulls in some of
Steve's block allocation work, and uses it to simply allocate the blocks for
the file, and zero them out at allocation time. It provides a slight
performance increase, and it dramatically simplifies the code.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 13d921e3 07-Sep-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up ->page_mkwrite

This patch brings gfs2's ->page_mkwrite uptodate with respect to the
expectations set by the VM. Also added is a check to wait if the fs
is frozen, before we attempt to get a glock. This will only work on
the node which initiates the freeze, but thats ok since the transaction
lock will still provide the expected barrier on other nodes.

The major change here is that we return a locked page now, except when
we don't return a page at all (error cases). This removes the race
which required rechecking the page after it was returned.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@kernel.dk>


# b5b24d7a 07-Sep-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix AIL flush issue during fsync

Unfortunately, it is not enough to just ignore locked buffers during
the AIL flush from fsync. We need to be able to ignore all buffers
which are locked, dirty or pinned at this stage as they might have
been added subsequent to the log flush earlier in the fsync function.

In addition, this means that we no longer need to rely on i_mutex to
keep out writes during fsync, so we can, as a side-effect, remove
that protection too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-By: Abhijith Das <adas@redhat.com>


# 54335b1f 01-Sep-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Cache the most recently used resource group in the inode

This means that after the initial allocation for any inode, the
last used resource group is cached in the inode for future use.
This drastically reduces the number of lookups of resource
groups in the common case, and this the contention on that
data structure.

The allocation algorithm is the same as previously, except that we
always check to see if the goal block is within the cached rgrp
first before going to the rbtree to look one up.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9453615a 23-Aug-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added

We need to take the inode's glock whenever the inode's size
is referenced, otherwise it might not be uptodate. Even
though generic_file_llseek_unlocked() doesn't implement
SEEK_DATA, SEEK_HOLE directly, it does reference the inode's
size in those cases, so we need to add them to the list
of origins which need the glock.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>


# ab9bbda0 15-Aug-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use ->dirty_inode()

The aim of this patch is to use the newly enhanced ->dirty_inode()
super block operation to deal with atime updates, rather than
piggy backing that code into ->write_inode() as is currently
done.

The net result is a simplification of the code in various places
and a reduction of the number of gfs2_dinode_out() calls since
this is now implied by ->dirty_inode().

Some of the mark_inode_dirty() calls have been moved under glocks
in order to take advantage of then being able to avoid locking in
->dirty_inode() when we already have suitable locks.

One consequence is that generic_write_end() now correctly deals
with file size updates, so that we do not need a separate check
for that afterwards. This also, indirectly, means that fdatasync
should work correctly on GFS2 - the current code always syncs the
metadata whether it needs to or not.

Has survived testing with postmark (with and without atime) and
also fsx.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f1818529 05-Aug-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix bug trap and journaled data fsync

Journaled data requires that a complete flush of all dirty data for
the file is done, in order that the ail flush which comes after
will succeed.

Also the recently enhanced bug trap can trigger falsely in case
an ail flush from fsync races with a page read. This updates the
bug trap such that it will ignore buffers which are locked and
only trigger on dirty and/or pinned buffers when the ail flush
is run from fsync. The original bug trap is retained when ail
flush is run from ->go_sync()

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2f0264d5 27-Jul-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Split data write & wait in fsync

Now that the data writing is part of fsync proper, we can split
the waiting part out and do it later on. This reduces the
number of waits that we do during fsync on average.

There is also no need to take the i_mutex unless we are flushing
metadata to disk, so we can move that to within the metadata
flushing code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 02c24a82 16-Jul-2011 Josef Bacik <josef@redhat.com>

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 10556cb2 20-Jun-2011 Al Viro <viro@zeniv.linux.org.uk>

->permission() sanitizing: don't pass flags to ->permission()

not used by the instances anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 9964afbb 16-Jun-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add S_NOSEC support

This adds S_NOSEC support to GFS2. We set/reset the flag either when
a user calls setattr or when we have just regained the glock
from another node. The flag is only set if there are no xattrs
on the inode and there is no suid bit set.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>


# 6905d9e4 26-Apr-2011 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: make sure fallocate bytes is a multiple of blksize

The GFS2 fallocate code chooses a target size to for allocating chunks of
space. Whenever it can't find any resource groups with enough space free, it
halves its target. Since this target is in bytes, eventually it will no longer
be a multiple of blksize. As long as there is more space available in the
resource group than the target, this isn't a problem, since gfs2 will use the
actual space available, which is always a multiple of blksize. However,
when gfs couldn't fallocate a bigger chunk than the target, it was using the
non-blksize aligned number. This caused a BUG in later code that required
blksize aligned offsets. GFS2 now ensures that bytes is always a multiple of
blksize

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dba898b0 14-Apr-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up fsync()

This patch is designed to clean up GFS2's fsync
implementation and ensure that it really does get everything on
disk. Since ->write_inode() has been updated, we can call that
via the vfs library function sync_inode_metadata() and the only
remaining thing that has to be done is to ensure that we get
any revoke records in the log after the inode has been written back.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0ee53206 17-Mar-2011 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: directly write blocks past i_size

GFS2 was relying on the writepage code to write out the zeroed data for
fallocate. However, with FALLOC_FL_KEEP_SIZE set, this may be past i_size.
If it is, it will be ignored. To work around this, gfs2 now calls
write_dirty_buffer directly on the buffer_heads when FALLOC_FL_KEEP_SIZE
is set, and it's writing past i_size.

This version is just a cleanup of my last version

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2e149670 23-Mar-2011 Serge E. Hallyn <serge@hallyn.com>

userns: rename is_owner_or_cap to inode_owner_or_capable

And give it a kernel-doc comment.

[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e4a7b7b0 10-Mar-2011 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: fix block allocation check for fallocate

GFS2 fallocate wasn't properly checking if a blocks were already allocated.
In write_empty_blocks(), if a page didn't have buffer_heads attached, GFS2
was always treating it as if there were no blocks allocated for that page.
GFS2 now calls gfs2_block_map() to check if the blocks are allocated before
writing them out.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0a33443b 09-Mar-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove potential race in flock code

This patch ensures that we always wait for glock demotion when
dropping flocks on a file in order to prevent any race
conditions associated with further flock calls or closing
the file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b9c93bb7 02-Feb-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Improve cluster mmap scalability

The mmap system call grabs a glock when an update to atime maybe
required. It does this in order to ensure that the flags on the
inode are uptodate, but since it will only mark atime for a future
update, an exclusive lock is not required here (one will be taken
later when the actual update is performed).

Also, the lock can be skipped when the mount is marked noatime in
addition to the original check which only looked at the noatime
flag for the inode itself.

This should increase the scalability of the mmap call when multiple
nodes are all mmaping the same file.

Reported-by: Scooter Morris <scooter@cgl.ucsf.edu>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2fe17c10 14-Jan-2011 Christoph Hellwig <hch@lst.de>

fallocate should be a file operation

Currently all filesystems except XFS implement fallocate asynchronously,
while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC
I/O we really want our allocation on disk, especially for the !KEEP_SIZE
case where we actually grow the file with user-visible zeroes. On the
other hand always commiting the transaction is a bad idea for fast-path
uses of fallocate like for example in recent Samba versions. Given
that block allocation is a data plane operation anyway change it from
an inode operation to a file operation so that we have the file structure
available that lets us check for O_SYNC.

This also includes moving the code around for a few of the filesystems,
and remove the already unnedded S_ISDIR checks given that we only wire
up fallocate for regular files.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# b74c79e9 06-Jan-2011 Nick Piggin <npiggin@kernel.dk>

fs: provide rcu-walk aware permission i_ops

Signed-off-by: Nick Piggin <npiggin@kernel.dk>


# 51ee4b84 31-Oct-2010 Christoph Hellwig <hch@infradead.org>

locks: let the caller free file_lock on ->setlease failure

The caller allocated it, the caller should free it.

The only issue so far is that we could change the flp pointer even on an
error return if the fl_change callback failed. But we can simply move
the flp assignment after the fl_change invocation, as the callers don't
care about the flp return value if the setlease call failed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 05fa3135 30-Oct-2010 J. Bruce Fields <bfields@redhat.com>

locks: fix setlease methods to free passed-in lock

We modified setlease to require the caller to allocate the new lease in
the case of creating a new lease, but forgot to fix up the filesystem
methods.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6038f373 15-Aug-2010 Arnd Bergmann <arnd@arndb.de>

llseek: automatically add .llseek fop

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>


# b89f4321 18-Sep-2010 Arnd Bergmann <arnd@arndb.de>

fs/locks.c: prepare for BKL removal

This prepares the removal of the big kernel lock from the
file locking code. We still use the BKL as long as fs/lockd
uses it and ceph might sleep, but we can flip the definition
to a private spinlock as soon as that's done.
All users outside of fs/lockd get converted to use
lock_flocks() instead of lock_kernel() where appropriate.

Based on an earlier patch to use a spinlock from Matthew
Wilcox, who has attempted this a few times before, the
earliest patch from over 10 years ago turned it into
a semaphore, which ended up being slower than the BKL
and was subsequently reverted.

Someone should do some serious performance testing when
this becomes a spinlock, since this has caused problems
before. Using a spinlock should be at least as good
as the BKL in theory, but who knows...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: linux-kernel@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org


# bf97b673 27-Sep-2010 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: reserve more blocks for transactions

Some of the functions in GFS2 were not reserving space in the transaction for
the resource group header and the resource groups bitblocks that get added
when you do allocation. GFS2 now makes sure to reserve space for the
resource group header and either all the bitblocks in the resource group, or
one for each block that it may allocate, whichever is smaller using the new
gfs2_rg_blocks() inline function.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a2e0f799 11-Aug-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove i_disksize

With the update of the truncate code, ip->i_disksize and
inode->i_size are merely copies of each other. This means
we can remove ip->i_disksize and use inode->i_size exclusively
reducing the size of a GFS2 inode by 8 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 461cb419 24-Jun-2010 Bob Peterson <rpeterso@redhat.com>

GFS2: Simplify gfs2_write_alloc_required

Function gfs2_write_alloc_required always returned zero as its
return code. Therefore, it doesn't need to return a return code
at all. Given that, we can use the return value to return whether
or not the dinode needs block allocations rather than passing
that value in, which in turn simplifies a bunch of error checking.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7ea80859 26-May-2010 Christoph Hellwig <hch@lst.de>

drop unused dentry argument to ->fsync

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7df0e039 24-May-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix permissions checking for setflags ioctl()

We should be checking for the ownership of the file for which
flags are being set, rather than just for write access.

Reported-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 720e7749 10-Mar-2010 Sachin Prabhu <sprabhu@redhat.com>

GFS2: Skip check for mandatory locks when unlocking

gfs2_lock() will skip locks on file which have mode set to 02666. This is a problem in cases where the mode of the file is changed after a process has obtained a lock on the file. Such a lock will be skipped and will result in a BUG in locks_remove_flock().

gfs2_lock() should skip the check for mandatory locks when unlocking a file.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 56aa616a 08-Dec-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Ensure uptodate inode size when using O_APPEND

The VFS reads the inode size during generic_file_aio_write() but
with no locking around it. In order to get the expected result
from O_APPEND opens, this patch updated the inode size before
calling generic_file_aio_write()

There is of course still a race here, in that there is nothing to
prevent another node coming in and extending the file in the
mean time. On the other hand, when used with file locking this
will ensure that the expected results are obtained.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f0f37e2f 27-Sep-2009 Alexey Dobriyan <adobriyan@gmail.com>

const: mark struct vm_struct_operations

* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 40b78a32 26-Aug-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up of extended attribute support

This has been on my list for some time. We need to change the way
in which we handle extended attributes to allow faster file creation
times (by reducing the number of transactions required) and the
extended attribute code is the main obstacle to this.

In addition to that, the VFS provides a way to demultiplex the xattr
calls which we ought to be using, rather than rolling our own. This
patch changes the GFS2 code to use that VFS feature and as a result
the code shrinks by a couple of hundred lines or so, and becomes
easier to read.

I'm planning on doing further clean up work in this area, but this
patch is a good start. The cleaned up code also uses the more usual
"xattr" shorthand, I plan to eliminate the use of "eattr" eventually
and in the mean time it serves as a flag as to which bits of the code
have been updated.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e09f9446 03-Jun-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove unused variable

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a12af1eb 31-May-2009 Abhijith Das <adas@redhat.com>

GFS2: smbd proccess hangs with flock() call.

GFS2 currently does not support mandatory flocks. An flock() call with
LOCK_MAND triggers unexpected behavior because gfs2 is not checking for
this lock type. This patch corrects that.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b1e71b06 22-May-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up some file names

This patch renames the ops_*.c files which have no counterpart
without the ops_ prefix in order to shorten the name and make
it more readable. In addition, ops_address.h (which was very
small) is moved into inode.h and inode.h is cleaned up by
adding extern where required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>