History log of /linux-master/fs/gfs2/bmap.c
Revision Date Author Comments
# c95346ac 11-Mar-2024 Andrew Price <anprice@redhat.com>

gfs2: Fix invalid metadata access in punch_hole

In punch_hole(), when the offset lies in the final block for a given
height, there is no hole to punch, but the maximum size check fails to
detect that. Consequently, punch_hole() will try to punch a hole beyond
the end of the metadata and fail. Fix the maximum size check.

Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 19871b5c 07-Dec-2023 Christoph Hellwig <hch@lst.de>

iomap: pass the length of the dirty region to ->map_blocks

Let the file system know how much dirty data exists at the passed
in offset. This allows file systems to allocate the right amount
of space that actually is written back if they can't eagerly
convert (e.g. because they don't support unwritten extents).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231207072710.176093-15-hch@lst.de
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 4c7b3f7f 20-Oct-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Get rid of gfs2_alloc_blocks generation parameter

Get rid of the generation parameter of gfs2_alloc_blocks(): we only ever
set the generation of the current inode while creating it, so do so
directly.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 92099f0c 19-Oct-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add metapath_dibh helper

Add a metapath_dibh() helper for extracting the inode's buffer head from
a metapath.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b4bf3d5c 14-Sep-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove unused gfs2_extent_length argument

The limit argument of gfs2_extent_length() is unused.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0a88810d 16-Oct-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

buffer: remove folio_create_empty_buffers()

With all users converted, remove the old create_empty_buffers() and rename
folio_create_empty_buffers() to create_empty_buffers().

Link: https://lkml.kernel.org/r/20231016201114.1928083-28-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 81cb277e 16-Oct-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

gfs2: convert inode unstuffing to use a folio

Use the folio APIs, removing numerous hidden calls to compound_head().
Also remove the stale comment about the page being looked up if it's NULL.

Link: https://lkml.kernel.org/r/20231016201114.1928083-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 580f721b 04-Oct-2023 Jeff Layton <jlayton@kernel.org>

gfs2: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-38-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# dc0b9435 26-Jul-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs: Don't use GFP_NOFS in gfs2_unstuff_dinode

Revert the rest of commit 220cca2a4f58 ("GFS2: Change truncate page
allocation to be GFP_NOFS"):

In gfs2_unstuff_dinode(), there is no need to carry out the page cache
allocation under GFP_NOFS because inodes on the "regular" filesystem are
never un-inlined under memory pressure, so switch back from
find_or_create_page() to grab_cache_page() here as well.

Inodes on the "metadata" filesystem can theoretically be un-inlined
under memory pressure, but any page cache allocations in that context
would happen in GFP_NOFS context because those inodes have
inode->i_mapping->gfp_mask set to GFP_NOFS (see the previous patch).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d6bb59a9 19-May-2023 Matthew Wilcox (Oracle) <willy@infradead.org>

iomap: Create large folios in the buffered write path

Use the size of the write as a hint for the size of the folio to create.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 8a8b8d91 05-Jul-2023 Jeff Layton <jlayton@kernel.org>

gfs2: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-45-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# e4f82bf2 26-Apr-2023 Bob Peterson <rpeterso@redhat.com>

gfs2: fix minor comment typos

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7d1b3778 07-Apr-2023 Bob Peterson <rpeterso@redhat.com>

gfs2: Eliminate gfs2_trim_blocks

Function gfs2_trim_blocks is not referenced. Eliminate it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c1b0c3cf 01-Feb-2023 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Convert gfs2_page_add_databufs to folios

Convert gfs2_page_add_databufs() to folios and rename it to
gfs2_trans_add_databufs().

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 471859f5 15-Jan-2023 Andreas Gruenbacher <agruenba@redhat.com>

iomap: Rename page_ops to folio_ops

The operations in struct page_ops all operate on folios, so rename
struct page_ops to struct folio_ops.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: port around not removing iomap_valid]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# c82abc23 15-Jan-2023 Andreas Gruenbacher <agruenba@redhat.com>

iomap: Rename page_prepare handler to get_folio

The ->page_prepare() handler in struct iomap_page_ops is now somewhat
misnamed, so rename it to ->get_folio().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 9060bc4d 15-Jan-2023 Andreas Gruenbacher <agruenba@redhat.com>

iomap/gfs2: Get page in page_prepare handler

Change the iomap ->page_prepare() handler to get and return a locked
folio instead of doing that in iomap_write_begin(). This allows to
recover from out-of-memory situations in ->page_prepare(), which
eliminates the corresponding error handling code in iomap_write_begin().
The ->put_folio() handler now also isn't called with NULL as the folio
value anymore.

Filesystems are expected to use the iomap_get_folio() helper for getting
locked folios in their ->page_prepare() handlers.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 40405ddd 15-Jan-2023 Andreas Gruenbacher <agruenba@redhat.com>

iomap: Rename page_done handler to put_folio

The ->page_done() handler in struct iomap_page_ops is now somewhat
misnamed in that it mainly deals with unlocking and putting a folio, so
rename it to ->put_folio().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 80baab88 15-Jan-2023 Andreas Gruenbacher <agruenba@redhat.com>

iomap/gfs2: Unlock and put folio in page_done handler

When an iomap defines a ->page_done() handler in its page_ops, delegate
unlocking the folio and putting the folio reference to that handler.

This allows to fix a race between journaled data writes and folio
writeback in gfs2: before this change, gfs2_iomap_page_done() was called
after unlocking the folio, so writeback could start writing back the
folio's buffers before they could be marked for writing to the journal.
Also, try_to_free_buffers() could free the buffers before
gfs2_iomap_page_done() was done adding the buffers to the current
current transaction. With this change, gfs2_iomap_page_done() adds the
buffers to the current transaction while the folio is still locked, so
the problems described above can no longer occur.

The only current user of ->page_done() is gfs2, so other filesystems are
not affected. To catch out any out-of-tree users, switch from a page to
a folio in ->page_done().

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 70376c7f 04-Dec-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Always check inode size of inline inodes

Check if the inode size of stuffed (inline) inodes is within the allowed
range when reading inodes from disk (gfs2_dinode_in()). This prevents
us from on-disk corruption.

The two checks in stuffed_readpage() and gfs2_unstuffer_page() that just
truncate inline data to the maximum allowed size don't actually make
sense, and they can be removed now as well.

Reported-by: syzbot+7bb81dfa9cda07d9cd9d@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 1420c4a5 14-Jul-2022 Bart Van Assche <bvanassche@acm.org>

fs/buffer: Combine two submit_bh() and ll_rw_block() arguments

Both submit_bh() and ll_rw_block() accept a request operation type and
request flags as their first two arguments. Micro-optimize these two
functions by combining these first two arguments into a single argument.
This patch does not change the behavior of any of the modified code.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Song Liu <song@kernel.org> (for the md changes)
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# d031a886 14-Apr-2022 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix filesystem block deallocation for short writes

When a write cannot be carried out in full, gfs2_iomap_end() releases
blocks that have been allocated for this write but haven't been used.

To compute the end of the allocation, gfs2_iomap_end() incorrectly
rounded the end of the attempted write down to the next block boundary
to arrive at the end of the allocation. It would have to round up, but
the end of the allocation is also available as iomap->offset +
iomap->length, so just use that instead.

In addition, use round_up() for computing the start of the unused range.

Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b2963932 03-Mar-2022 Bob Peterson <rpeterso@redhat.com>

gfs2: Remove return value for gfs2_indirect_init

The return value from function gfs2_indirect_init is never used, so
remove it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7336905a 10-Dec-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_setattr_size error path fix

When gfs2_setattr_size() fails, it calls gfs2_rs_delete(ip, NULL) to get
rid of any reservations the inode may have. Instead, it should pass in
the inode's write count as the second parameter to allow
gfs2_rs_delete() to figure out if the inode has any writers left.

In a next step, there are two instances of gfs2_rs_delete(ip, NULL) left
where we know that there can be no other users of the inode. Replace
those with gfs2_rs_deltree(&ip->i_res) to avoid the unnecessary write
count check.

With that, gfs2_rs_delete() is only called with the inode's actual write
count, so get rid of the second parameter.

Fixes: a097dc7e24cb ("GFS2: Make rgrp reservations part of the gfs2_inode structure")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# f3506eee 05-Nov-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix length of holes reported at end-of-file

Fix the length of holes reported at the end of a file: the length is
relative to the beginning of the extent, not the seek position which is
rounded down to the filesystem block size.

This bug went unnoticed for some time, but is now caught by the
following assertion in iomap_iter_done():

WARN_ON_ONCE(iter->iomap.offset + iter->iomap.length <= iter->pos)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b924bdab 11-Aug-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Move the inode glock locking to gfs2_file_buffered_write

So far, for buffered writes, we were taking the inode glock in
gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
not holding the inode glock while iomap_write_actor faults in user
pages. It turns out that iomap_write_actor is called inside iomap_begin
... iomap_end, so the user pages were still faulted in while holding the
inode glock and the locking code in iomap_begin / iomap_end was
completely pointless.

Move the locking into gfs2_file_buffered_write instead. We'll take care
of the potential deadlocks due to faulting in user pages while holding a
glock in a subsequent patch.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 1d25d0ae 10-Aug-2021 Christoph Hellwig <hch@lst.de>

iomap: remove the iomap arguments to ->page_{prepare,done}

These aren't actually used by the only instance implementing the methods.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 7a607a41 17-Jun-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up gfs2_unstuff_dinode

Split __gfs2_unstuff_inode off from gfs2_unstuff_dinode and clean up the
code a little. All remaining callers now pass NULL as the page argument
of gfs2_unstuff_dinode, so remove that argument.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c551f66c 30-Mar-2021 Lee Jones <lee.jones@linaro.org>

gfs2: Fix a number of kernel-doc warnings

Building the kernel with W=1 results in a number of kernel-doc warnings
like incorrect function names and parameter descriptions. Fix those,
mostly by adding missing parameter descriptions, removing left-over
descriptions, and demoting some less important kernel-doc comments into
regular comments.

Originally proposed by Lee Jones; improved and combined into a single
patch by Andreas.

Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 6d8da302 25-Mar-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_buffer

Instead of only supporting GFS2_METATYPE_DI and GFS2_METATYPE_IN blocks,
make the block type a parameter of gfs2_meta_indirect_buffer and rename
the function to gfs2_meta_buffer.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 152f58c9 27-Mar-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent

We don't need two very similar functions for mapping logical blocks to physical
blocks.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 9153dac1 31-Mar-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extent

Convert gfs2_extent_map to iomap and split it into gfs2_get_extent and
gfs2_alloc_extent. Instead of hardcoding the extent size, pass it in
via the extlen parameter.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 54992257 27-Mar-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add new gfs2_iomap_get helper

Rename the current gfs2_iomap_get and gfs2_iomap_alloc functions to __*.
Add a new gfs2_iomap_get helper that doesn't expose struct metapath.
Rename gfs2_iomap_get_alloc to gfs2_iomap_alloc. Use the new helpers
where they make sense.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 4fc7ec31 24-Apr-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: Use resource group glock sharing

This patch takes advantage of the new glock holder sharing feature for
resource groups. We have already introduced local resource group
locking in a previous patch, so competing accesses of local processes
are already under control.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7009fa9c 09-Feb-2021 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end

When starting an iomap write, gfs2_quota_lock_check -> gfs2_quota_lock
-> gfs2_quota_hold is called from gfs2_iomap_begin. At the end of the
write, before unlocking the quotas, punch_hole -> gfs2_quota_hold can be
called again in gfs2_iomap_end, which is incorrect and leads to a failed
assertion. Instead, move the call to gfs2_quota_unlock before the call
to punch_hole to fix that.

Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c65b76b8 11-Oct-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Only use struct gfs2_rbm for bitmap manipulations

GFS2 uses struct gfs2_rbm to represent a filesystem block number as a
bit position within a resource group. This representation is used in
the bitmap manipulation code to prevent excessive conversions between
block numbers and bit positions, but also in struct gfs2_blkreserv which
is part of struct gfs2_inode, to mark the start of a reservation. In
the inode, the bit position representation makes less sense: first, the
start position is used as a block number about as often as a bit
position; second, the bit position representation makes the code
unnecessarily complicated and difficult to read.

Therefore, change struct gfs2_blkreserv to represent the start of a
reservation as a block number instead of a bit position. (This requires
keeping track of the resource group in gfs2_blkreserv separately.) With
that change, various things can be slightly simplified, and struct
gfs2_rbm can be moved to rgrp.c.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# d3039c06 11-Nov-2020 Bob Peterson <rpeterso@redhat.com>

Revert "gfs2: Ignore journal log writes for jdata holes"

This reverts commit b2a846dbef4ef54ef032f0f5ee188c609a0278a7.

That commit changed the behavior of function gfs2_block_map to return
-ENODATA in cases where a hole (IOMAP_HOLE) is encountered and create is
false. While that fixed the intended problem for jdata, it also broke
other callers of gfs2_block_map such as some jdata block reads. Before
the patch, an encountered hole would be skipped and the buffer seen as
unmapped by the caller. The patch changed the behavior to return
-ENODATA, which is interpreted as an error by the caller.

The -ENODATA return code should be restricted to the specific case where
jdata holes are encountered during ail1 writes. That will be done in a
later patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b2a846db 26-Aug-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Ignore journal log writes for jdata holes

When flushing out its ail1 list, gfs2_write_jdata_page calls function
__block_write_full_page passing in function gfs2_get_block_noalloc.
But there was a problem when a process wrote to a jdata file, then
truncated it or punched a hole, leaving references to the blocks within
the new hole in its ail list, which are to be written to the journal log.

In writing them to the journal, after calling gfs2_block_map, function
gfs2_get_block_noalloc determined that the (hole-punched) block was not
mapped, so it returned -EIO to generic_writepages, which passed it back
to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
there was a real IO error writing to the journal.

This might be a valid error when writing metadata to the journal, but for
journaled data writes, it does not warrant a withdraw.

This patch adds a check to function gfs2_block_map that makes an exception
for journaled data writes that correspond to jdata holes: If the iomap
get function returns a block type of IOMAP_HOLE, it instead returns
-ENODATA which does not cause the withdraw. Other errors are returned as
before.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# a6645745 19-Aug-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: simplify gfs2_block_map

Function gfs2_block_map had a lot of redundancy between its create and
no_create paths. This patch simplifies the code to eliminate the redundancy.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2164f9b9 01-Jul-2019 Christoph Hellwig <hch@lst.de>

gfs2: use iomap for buffered I/O in ordered and writeback mode

Switch to using the iomap readpage and writepage helpers for all I/O in
the ordered and writeback modes, and thus eliminate using buffer_heads
for I/O in these cases. The journaled data mode is left untouched.

(Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
to set_page_dirty instead of accidentally leaving the page / buffer clean.)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>


# 70499cdf 23-Jul-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Never call gfs2_block_zero_range with an open transaction

Before this patch, some functions started transactions then they called
gfs2_block_zero_range. However, gfs2_block_zero_range, like writes, can
start transactions, which results in a recursive transaction error.
For example:

do_shrink
trunc_start
gfs2_trans_begin <------------------------------------------------
gfs2_block_zero_range
iomap_zero_range(inode, from, length, NULL, &gfs2_iomap_ops);
iomap_apply ... iomap_zero_range_actor
iomap_begin
gfs2_iomap_begin
gfs2_iomap_begin_write
actor (iomap_zero_range_actor)
iomap_zero
iomap_write_begin
gfs2_iomap_page_prepare
gfs2_trans_begin <------------------------

This patch reorders the callers of gfs2_block_zero_range so that they
only start their transactions after the call. It also adds a BUG_ON to
ensure this doesn't happen again.

Fixes: 2257e468a63b ("gfs2: implement gfs2_block_zero_range using iomap_zero_range")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# 566a2ab3 20-Apr-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Another gfs2_walk_metadata fix

Make sure we don't walk past the end of the metadata in gfs2_walk_metadata: the
inode holds fewer pointers than indirect blocks.

Slightly clean up gfs2_iomap_get.

Fixes: a27a0c9b6a20 ("gfs2: gfs2_walk_metadata fix")
Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 1595548f 06-Mar-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put

Keeping reservations and quotas separate helps reviewing the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 2fba46a0 26-Feb-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: Change inode qa_data to allow multiple users

Before this patch, multiple users called gfs2_qa_alloc which allocated
a qadata structure to the inode, if quotas are turned on. Later, in
file close or evict, the structure was deleted with gfs2_qa_delete.
But there can be several competing processes who need access to the
structure. There were races between file close (release) and the others.
Thus, a release could delete the structure out from under a process
that relied upon its existence. For example, chown.

This patch changes the management of the qadata structures to be
a get/put scheme. Function gfs2_qa_alloc has been changed to gfs2_qa_get
and if the structure is allocated, the count essentially starts out at
1. Function gfs2_qa_delete has been renamed to gfs2_qa_put, and the
last guy to decrement the count to 0 frees the memory.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# d580712a 06-Mar-2020 Bob Peterson <rpeterso@redhat.com>

gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc

Before this patch, multiple callers called gfs2_rsqa_alloc to force
the existence of a reservations structure and a quota data structure
if needed. However, now the reservations are handled separately, so
the quota data is only the quota data. So we eliminate the one in
favor of just calling gfs2_qa_alloc directly.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 969183bc 03-Feb-2020 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Switch to list_{first,last}_entry

Replace open-coded versions of list_first_entry and list_last_entry with those
functions.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 39c3a948 06-Sep-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Improve mmap write vs. punch_hole consistency

When punching a hole in a file, use filemap_write_and_wait_range to
write back any dirty pages in the range of the hole. As a side effect,
if the hole isn't page aligned, this marks unaligned pages at the
beginning and the end of the hole read-only. This is required when the
block size is smaller than the page size: when those pages are written
to again after the hole punching, we must make sure that page_mkwrite is
called for those pages so that the page will be fully allocated and any
blocks turned into holes from the hole punching will be reallocated.
(If a page is writably mapped, page_mkwrite won't be called.)

Fixes xfstest generic/567.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c039b997 18-Oct-2019 Goldwyn Rodrigues <rgoldwyn@suse.com>

iomap: use a srcmap for a read-modify-write I/O

The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target. The srcmap is only supported for buffered writes so far.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>


# f0b444b3 12-Sep-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps

In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.

This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.

Fixes: d552a2b9b33e ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# b473bc2d 06-Sep-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Improve mmap write vs. truncate consistency

On filesystems with a block size smaller than PAGE_SIZE, page_mkwrite is
called for each memory-mapped page before that page can be written to.
When such a memory-mapped file is truncated down to size x which is not
a multiple of the page size and then back to a larger size, the page
straddling size x can end up with a partial block mapping. In that
case, make sure to mark that page read-only so that page_mkwrite will be
called before the page can be written to the next time.

(There is no point in marking the page straddling size x read-only when
truncating down as writing to memory beyond the end of the file will
result in SIGBUS instead of growing the file.)

Fixes xfstests generic/029, generic/030 on filesystems with a block size
smaller than PAGE_SIZE.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2257e468 01-Jul-2019 Christoph Hellwig <hch@lst.de>

gfs2: implement gfs2_block_zero_range using iomap_zero_range

iomap handles all the nitty-gritty details of zeroing a file
range for us, so use the proper helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 72d36d05 12-Jul-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add support for IOMAP_ZERO

Add support for the IOMAP_ZERO iomap operation so that iomap_zero_range will
work as expected. In the IOMAP_ZERO case, the caller of iomap_zero_range is
responsible for taking an exclusive glock on the inode, so we need no
additional locking in gfs2_iomap_begin.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 34aad20b 05-Jul-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_iomap_begin cleanup

Following commit d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock"),
gfs2_iomap_begin and gfs2_iomap_begin_write can be further cleaned up and the
split between those two functions can be improved.

With suggestions from Christoph Hellwig.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# a27a0c9b 04-Aug-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_walk_metadata fix

It turns out that the current version of gfs2_metadata_walker suffers
from multiple problems that can cause gfs2_hole_size to report an
incorrect size. This will confuse fiemap as well as lseek with the
SEEK_DATA flag.

Fix that by changing gfs2_hole_walker to compute the metapath to the
first data block after the hole (if any), and compute the hole size
based on that.

Fixes xfstest generic/490.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
Cc: stable@vger.kernel.org # v4.18+


# 706cb549 27-Jul-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Inode dirtying fix

With the recent iomap write page reclaim deadlock fix, it turns out that the
GLF_DIRTY flag isn't always set when it needs to be anymore: previously, this
happened as a side effect of always adding the inode buffer head to the current
transaction with gfs2_trans_add_meta, but this isn't happening consistently
anymore. Fix by removing an additional unnecessary gfs2_trans_add_meta call
and by setting the GLF_DIRTY flag in gfs2_iomap_end.

(The GLF_DIRTY flag causes inode_go_sync to flush the transaction log when
syncing out the glock of that inode. When the flag isn't set, inode_go_sync
will skip inodes, including ones with an i_state of I_DIRTY_PAGES, which will
lead to cluster incoherency.)

In addition, in gfs2_iomap_page_done, if the metadata has changed, mark the
inode as I_DIRTY_DATASYNC to have the inode added to the current transaction:
we don't expect metadata to change here, but let's err on the safe side.

Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock");
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# bb4cb25d 03-Jul-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove unused gfs2_iomap_alloc argument

Remove the unused flags argument of gfs2_iomap_alloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 8d3e72a1 27-Jun-2019 Andreas Gruenbacher <agruenba@redhat.com>

iomap: don't mark the inode dirty in iomap_write_end

Marking the inode dirty for each page copied into the page cache can be
very inefficient for file systems that use the VFS dirty inode tracking,
and is completely pointless for those that don't use the VFS dirty inode
tracking. So instead, only set an iomap flag when changing the in-core
inode size, and open code the rest of __generic_write_end.

Partially based on code from Christoph Hellwig.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# f29e62ee 13-May-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: replace more printk with calls to fs_info and friends

This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 2741b672 08-Jun-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix rounding error in gfs2_iomap_page_prepare

The pos and len arguments to the iomap page_prepare callback are not
block aligned, so we need to take that into account when computing the
number of blocks.

Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7336d0e6 31-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398

Based on 1 normalized pattern(s):

this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# d0a22a4b 29-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix iomap write page reclaim deadlock

Since commit 64bc06bb32ee ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end. This approach suffers from
two problems:

(1) Any allocations necessary for the write are done in iomap_begin, so when
the data aren't journaled, there is no need for keeping the transaction open
until iomap_end.

(2) Transactions keep the gfs2 log flush lock held. When
iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
gfs2_write_inode, which will try to flush the log. This requires taking the
log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end. Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Reported-by: Edwin Török <edvin.torok@citrix.com>
Fixes: 64bc06bb32ee ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# fbb27873 04-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}

Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no
such thing as an "unrevoke" object; all this function does is remove
existing revoke objects plus some bookkeeping.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 7c70b896 25-Mar-2019 Bob Peterson <rpeterso@redhat.com>

gfs2: clean_journal improperly set sd_log_flush_head

This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, new function
gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.

This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.

Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# df0db3ec 30-Apr-2019 Andreas Gruenbacher <agruenba@redhat.com>

iomap: Add a page_prepare callback

Move the page_done callback into a separate iomap_page_ops structure and
add a page_prepare calback to be called before the next page is written
to. In gfs2, we'll want to start a transaction in page_prepare and end
it in page_done. Other filesystems that implement data journaling will
require the same kind of mechanism.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 0a4c9265 23-Jan-2019 Gustavo A. R. Silva <gustavo@embeddedor.com>

fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>


# bc020561 18-Dec-2018 Bob Peterson <rpeterso@redhat.com>

gfs2: take jdata unstuff into account in do_grow

Before this patch, function do_grow would not reserve enough journal
blocks in the transaction to unstuff jdata files while growing them.
This patch adds the logic to add one more block if the file to grow
is jdata.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>


# 98583b3e 09-Nov-2018 Abhi Das <adas@redhat.com>

gfs2: add more timing info to journal recovery process

Tells you how many milliseconds map_journal_extents and find_jhead
take.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# c26b5aa8 11-Nov-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix iomap buffer head reference counting bug

GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
gfs2_iomap_end in iomap->private. It sets that private pointer in
gfs2_iomap_get. Users of gfs2_iomap_get other than gfs2_iomap_begin
would have to release iomap->private, but this isn't done correctly,
leading to a leak of buffer head references.

To fix this, move the code for setting iomap->private from
gfs2_iomap_get to gfs2_iomap_begin.

Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e7445ced 08-Nov-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix metadata read-ahead during truncate (2)

The previous attempt to fix for metadata read-ahead during truncate was
incorrect: for files with a height > 2 (1006989312 bytes with a block
size of 4096 bytes), read-ahead requests were not being issued for some
of the indirect blocks discovered while walking the metadata tree,
leading to significant slow-downs when deleting large files. Fix that.

In addition, only issue read-ahead requests in the first pass through
the meta-data tree, while deallocating data blocks.

Fixes: c3ce5aa9b0 ("gfs2: Fix metadata read-ahead during truncate")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# fee5150c 10-Oct-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix iomap buffered write support for journaled files (2)

It turns out that the fix in commit 6636c3cc56 is bad; the assertion
that the iomap code no longer creates buffer heads is incorrect for
filesystems that set the IOMAP_F_BUFFER_HEAD flag.

Instead, what's happening is that gfs2_iomap_begin_write treats all
files that have the jdata flag set as journaled files, which is
incorrect as long as those files are inline ("stuffed"). We're handling
stuffed files directly via the page cache, which is why we ended up with
pages without buffer heads in gfs2_page_add_databufs.

Fix this by handling stuffed journaled files correctly in
gfs2_iomap_begin_write.

This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 0ddeded4 04-Oct-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Pass resource group to rgblk_free

Function rgblk_free can only deal with one resource group at a time, so
pass that resource group is as a parameter. Several of the callers
already have the resource group at hand, so we only need additional
lookup code in a few places.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>


# dc480feb 09-Oct-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix iomap buffered write support for journaled files

Commit 64bc06bb32ee broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done. However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs. Fix
that by creating buffer heads ourself when needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 77612578 25-Jul-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Special-case rindex for gfs2_grow

To speed up the common case of appending to a file,
gfs2_write_alloc_required presumes that writing beyond the end of a file
will always require additional blocks to be allocated. This assumption
is incorrect for preallocates files, but there are no negative
consequences as long as *some* space is still left on the filesystem.

One special file that always has some space preallocated beyond the end
of the file is the rindex: when growing a filesystem, gfs2_grow adds one
or more new resource groups and appends records describing those
resource groups to the rindex; the preallocated space ensures that this
is always possible.

However, when a filesystem is completely full, gfs2_write_alloc_required
will indicate that an additional allocation is required, and appending
the next record to the rindex will fail even though space for that
record has already been preallocated. To fix that, skip the incorrect
optimization in gfs2_write_alloc_required, but for the rindex only.
Other writes to preallocated space beyond the end of the file are still
allowed to fail on completely full filesystems.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 967bcc91 19-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: iomap direct I/O support

The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# bcfe9413 11-May-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: gfs2_extent_length cleanup

Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 64bc06bb 24-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: iomap buffered write support

With the traditional page-based writes, blocks are allocated separately
for each page written to. With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# d505a96a 24-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Further iomap cleanups

In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage. This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Further clean up gfs2_iomap_get and implement gfs2_stuffed_iomap here
directly. For reads beyond the end of the file, return holes instead of
failing with -ENOENT so that we can get rid of that special case in
gfs2_block_map.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>


# 00251a16 18-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Minor clarification to __gfs2_punch_hole

Rename end_off to end_len to make the code less confusing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 628e366d 04-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Iomap cleanups and improvements

Clean up gfs2_iomap_alloc and gfs2_iomap_get. Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.

Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.

In gfs2_block_map and fallocate_chunk, zero-initialize struct iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 845802b1 04-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove ordered write mode handling from gfs2_trans_add_data

In journaled data mode, we need to add each buffer head to the current
transaction. In ordered write mode, we only need to add the inode to
the ordered inode list. So far, both cases are handled in
gfs2_trans_add_data. This makes the code look misleading and is
inefficient for small block sizes as well. Handle both cases separately
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 7841b9f0 04-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: hole_size improvement

Reimplement function hole_size based on a generic function for walking
the metadata tree and rename hole_size to gfs2_hole_size. While
previously, multiple invocations of hole_size were sometimes needed to
walk across the entire hole, the new implementation always returns the
entire hole at once (provided that the caller is interested in the total
size).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 07e23d68 04-Jun-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Update find_metapath comment

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 7ee66c03 01-Jun-2018 Christoph Hellwig <hch@lst.de>

iomap: move IOMAP_F_BOUNDARY to gfs2

Just define a range of fs specific flags and use that in gfs2 instead of
exposing this internal flag globally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 19319b53 01-Jun-2018 Christoph Hellwig <hch@lst.de>

iomap: inline data should be an iomap type, not a flag

Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address. So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 9a38662b 16-Apr-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove sdp->sd_jheightsize

GFS2 keeps two arrarys in the superblock that define the maximum size of
an inode depending on the inode's height: sdp->sd_heightsize defines the
heights in units of sb->s_blocksize; sdp->sd_jheightsize defines them in
units of sb->s_blocksize - sizeof(struct gfs2_meta_header). These
arrays are used to determine when additional layers of indirect blocks
are needed. The second array is used for directories which have an
additional gfs2_meta_header at the beginning of each block.

Distinguishing between these two cases makes no sense: the height
required for representing N blocks will come out the same no matter if
the calculation is done in gross (sb->s_blocksize) or net
(sb->s_blocksize - sizeof(struct gfs2_meta_header)) units.

Stuffed directories don't have an additional gfs2_meta_header, but the
stuffed case is handled separately for both files and directories,
anyway.

Remove the unncessary sdp->sd_jheightsize array.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 3e7aafc3 06-Apr-2018 Bob Peterson <rpeterso@redhat.com>

GFS2: Minor improvements to comments and documentation

This patch simply fixes some comments and the gfs2-glocks.txt file:
Places where i_rwsem was called i_mutex, and adding i_rw_mutex.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# fffb6412 29-Mar-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Zero out fallocated blocks in fallocate_chunk

Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
out in fallocate_chunk, much higher up the call stack. This gets rid of
gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
buffer flag. I can't think of a reason why zeroing out the blocks in
gfs2_iomap_alloc would have any benefits: there is no additional locking
at that level that would add protection to the newly allocated blocks.

While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>


# bb491ce6 23-Mar-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Check for the end of metadata in punch_hole

When punching a hole or truncating an inode down to a given size, also
check if the truncate point / start of the hole is within the range we
have metadata for. Otherwise, we can end up freeing blocks that
shouldn't be freed, corrupting the inode, or crashing the machine when
trying to punch a hole into the void.

When growing an inode via truncate, we set the new size but we don't
allocate additional levels of indirect blocks and grow the inode height.
When shrinking that inode again, the new size may still point beyond the
end of the inode's metadata.

Fixes xfstest generic/476.

Debugged-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# d39d18e0 05-Mar-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Improve gfs2_block_map comment

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 3b5da96e 05-Mar-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fixes to "Implement iomap for block_map" (2)

It turns out that commit 3229c18c0d6b2 'Fixes to "Implement iomap for
block_map"' introduced another bug in gfs2_iomap_begin that can cause
gfs2_block_map to set bh->b_size of an actual buffer to 0. This can
lead to arbitrary incorrect behavior including crashes or disk
corruption. Revert the incorrect part of that commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 49edd5bf 06-Feb-2018 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fixes to "Implement iomap for block_map"

It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:

In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense. After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.

In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately. Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.

Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 235628c5 14-Nov-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Add gfs2_max_stuffed_size

Add a small inline function for computing the maximum size of a stuffed
inode instead of open coding that in several places throughout the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 4e56a641 14-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE)

Implement the top-level bits of punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 10d2cf94 18-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Turn trunc_dealloc into punch_hole

Add an upper bound to the range of blocks to deallocate blocks to
function trunc_dealloc so that this function can be used for truncating
a file as well as for punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 5cf26b1e 10-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Generalize truncate code

Pull the code for computing the range of metapointers to iterate out of
gfs2_metapath_ra (for readahead), sweep_bh_for_rgrps (for deallocating
metapointers within a block), and trunc_dealloc (for walking the
metadata tree).

In sweep_bh_for_rgrps, move the code for looking up the resource group
descriptor of the current resource group out of the inner loop. The
metatype check moves to trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# bdba0d5e 13-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

Turn gfs2_block_truncate_page into gfs2_block_zero_range

Turn gfs2_block_truncate_page into a function that zeroes a range within
a block rather than only the end of a block. This will be used for
cleaning the end of the first partial block and the start of the last
partial block when punching a hole in a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# cb7f0903 04-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Improve non-recursive delete algorithm

In rare cases, the current non-recursive delete algorithm doesn't
deallocate empty intermediary indirect blocks. This should have very
little practical effect, but deallocating all blocks correctly should
still be preferable as it is cleaner and easier to validate.

The fix consists of using the first block to deallocate to compute the
start marker of the truncate point instead of the last block that needs
to be kept. With that change, computing which indirect blocks are still
needed becomes relatively easy.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# c3ce5aa9 08-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Fix metadata read-ahead during truncate

The metadata read-ahead algorithm broke when switching from recursive to
non-recursive delete: the current algorithm reads ahead blocks at height
N - 1 while deallocating the blocks at hight N. However, deallocating
the blocks at height N requires a complete walk of the metadata tree,
not only down to height N - 1. Consequently, all blocks below height
N - 1 will be accessed without read-ahead.

Fix this by issuing read-aheads as early as possible, after each
metapath lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# e8b43fe0 08-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up {lookup,fillup}_metapath

Split out the entire lookup loop from lookup_metapath and
fillup_metapath. Make both functions return the actual height in
mp->mp_aheight, and return 0 on success. Handle lookup errors properly
in trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# e7fdf004 12-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Remove minor gfs2_journaled_truncate inefficiencies

First, this function truncates the file in chunks. When the original
file size isn't block aligned, each chunk that is truncated will remain
be misaligned. This is inefficient.

Second, this function doesn't recognize where holes are, so it loops
through them. For each chunk of a hole, it creates a new transaction.
At least avoid creating another transactions whe the current one is
still empty. (An better fix would be to skip large holes, of course.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 8b5860a3 12-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: truncate: Remove unnecessary oldsize parameters

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 80990f40 12-Dec-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clean up trunc_start error path

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 90bcab99 22-Dec-2017 Steven Whitehouse <swhiteho@redhat.com>

gfs2: Add gfs2_blk2rgrpd comment and fix incorrect use

Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching. Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 3974320c 16-Feb-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Implement iomap for block_map

This patch implements iomap for block mapping, and switches the
block_map function to use it under the covers.

The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has
reached a "metadata boundary" and fetching the next mapping is likely to
incur an additional I/O. This flag is used for setting the bh buffer
boundary flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 5f8bd444 28-Oct-2016 Bob Peterson <rpeterso@redhat.com>

GFS2: Make height info part of metapath

This patch eliminates height parameters from function gfs2_bmap_alloc.
Function find_metapath determines the metapath's "find height", also
known as the desired height. Function lookup_metapath determines the
metapath's "actual height", previously known as starting height or
sheight. Function gfs2_bmap_alloc now gets both height values from
the metapath. This simplification was done as a step toward switching
the block_map functions to using iomap. The bh_map responsibilities
are also removed from function gfs2_bmap_alloc for the same reason.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>


# 20cdc193 22-Sep-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Clarify gfs2_block_map

Add a comment about the logical block size for directories. Rename
"bsize" in gfs2_block_map to "factor". Fix a typo in the description of
metaptr1.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# c4a9d189 30-Aug-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Fix non-recursive truncate bug

Before this patch if you truncated a file to a smaller size it
wasn't freeing all the blocks properly. There are two reasons.

First, the metapath comparison was not comparing previous heights.
I added a function, mp_eq_to_hgt, which checks the metapath at
all heights prior to the target height.

Second, in function find_nonnull_ptr, it needed to zero out all
pointers for heights following the target height. Translated into
decimal integer terms, this way a number like 299, when incremented,
becomes 300, not 399. The 2 gets incremented to 3, and the following
digits need to be reset.

These two things allow the truncate state machine to properly find
the blocks it needs to delete.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# e477b24b 21-Jul-2017 Coly Li <colyli@suse.de>

gfs2: add flag REQ_PRIO for metadata I/O

When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
the bio. But flag REQ_META is just a hint for block trace, not for block
layer code to handle a bio as metadata request.

For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
would be very informative to block layer code. For example, if bcache is
used as a I/O cache for gfs2, it will be possible for bcache code to get
the hint and cache the pre-fetched metadata blocks on cache device. This
behavior may be helpful to improve metadata I/O performance if the
following requests hit the cache.

Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
- All places where REQ_READAHEAD is used, gfs2 code uses this flag for
metadata read ahead.
- In gfs2_meta_rq() where the first metadata block is read in.
- In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
up to date.
These metadata blocks are probably to be accessed again in future, adding
a REQ_PRIO flag may have bcache to keep such metadata in fast cache
device. For system without a cache layer, REQ_PRIO can still provide hint
to block layer to handle metadata requests more properly.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 6f6597ba 30-Jun-2017 Andreas Gruenbacher <agruenba@redhat.com>

gfs2: Protect gl->gl_object by spin lock

Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b32c8c76 08-May-2017 Stephen Rothwell <sfr@canb.auug.org.au>

gfs2: replace CURRENT_TIME with current_time

Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d552a2b9 06-Feb-2017 Bob Peterson <rpeterso@redhat.com>

GFS2: Non-recursive delete

Implement truncate/delete as a non-recursive algorithm. The older
algorithm was implemented with recursion to strip off each layer
at a time (going by height, starting with the maximum height.
This version tries to do the same thing but without recursion,
and without needing to allocate new structures or lists in memory.

For example, say you want to truncate a very large file to 1 byte,
and its end-of-file metapath is: 0.505.463.428. The starting
metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
needs to preserve that byte, and all metadata pointing to it.
So it would start at 0.0.0.0, look up all its metadata buffers,
then free all data blocks pointed to at the highest level.
After that buffer is "swept", it moves on to 0.0.0.1, then
0.0.0.2, etc., reading in buffers and sweeping them clean.
When it gets to the end of the 0.0.0 metadata buffer (for 4K
blocks the last valid one is 0.0.0.508), it backs up to the
previous height and starts working on 0.0.1.0, then 0.0.1.1,
and so forth. After it reaches the end and sweeps 0.0.1.508,
it continues with 0.0.2.0, and so on. When that height is
exhausted, and it reaches 0.0.508.508 it backs up another level,
to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
marching backwards and forwards through the metadata until it's
all swept clean. Once it has all the data blocks freed, it
lowers the strip height, and begins the process all over again,
but with one less height. This time it sweeps 0.0.0 through
0.505.463. When that's clean, it lowers the strip height again
and works to free 0.505. Eventually it strips the lowest height, 0.
For a delete or truncate to 0, all metadata for all heights of
0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
be preserved.

This isn't much different from normal integer incrementing,
where an integer gets incremented from 0000 (0.0.0.0) to 3021
(3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
just that each "digit" goes from 0 to 508 (for a total of 509
pointers) rather than from 0 to 9.

Note that the dinode will only have 483 pointers due to the
dinode structure itself.

Also note: this is just an example. These numbers (509 and 483)
are based on a standard 4K block size. Smaller block sizes will
yield smaller numbers of indirect pointers accordingly.

The truncation process is accomplished with the help of two
major functions and a few helper functions.

Functions do_strip and recursive_scan are obsolete, so removed.

New function sweep_bh_for_rgrps cleans a buffer_head pointed to
by the given metapath and height. By cleaning, I mean it frees
all blocks starting at the offset passed in metapath. It starts
at the first block in the buffer pointed to by the metapath and
identifies its resource group (rgrp). From there it frees all
subsequent block pointers that lie within that rgrp. If it's
already inside a transaction, it stays within it as long as it
can. In other words, it doesn't close a transaction until it knows
it's freed what it can from the resource group. In this way,
multiple buffers may be cleaned in a single transaction, as long
as those blocks in the buffer all lie within the same rgrp.

If it's not in a transaction, it starts one. If the buffer_head
has references to blocks within multiple rgrps, it frees all the
blocks inside the first rgrp it finds, then closes the
transaction. Then it repeats the cycle: identifies the next
unfreed block, uses it to find its rgrp, then starts a new
transaction for that set. It repeats this process repeatedly
until the buffer_head contains no more references to any blocks
past the given metapath.

Function trunc_dealloc has been reworked into a finite state
automaton. It has basically 3 active states:
DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:

The DEALLOC_MP_FULL state implies the metapath has a full set
of buffers out to the "shrink height", and therefore, it can
call function sweep_bh_for_rgrps to free the blocks within the
highest height of the metapath. If it's just swept the lowest
level (or an error has occurred) the state machine is ended.
Otherwise it proceeds to the DEALLOC_MP_LOWER state.

The DEALLOC_MP_LOWER state implies we are finished with a given
buffer_head, which may now be released, and therefore we are
then missing some buffer information from the metapath. So we
need to find more buffers to read in. In most cases, this is
just a matter of releasing the buffer_head and moving to the
next pointer from the previous height, so it may be read in and
swept as well. If it can't find another non-null pointer to
process, it checks whether it's reached the end of a height
and needs to lower the strip height, or whether it still needs
move forward through the previous height's metadata. In this
state, all zero-pointers are skipped. From this state, it can
only loop around (once more backing up another height) or,
once a valid metapath is found (one that has non-zero
pointers), proceed to state DEALLOC_FILL_MP.

The DEALLOC_FILL_MP state implies that we have a metapath
but not all its buffers are read in. So we must proceed to read
in buffer_heads until the metapath has a valid buffer for every
height. If the previous state backed us up 3 heights, we may
need to read in a buffer, increment the height, then repeat the
process until buffers have been read in for all required heights.
If it's successful reading a buffer, and it's at the highest
height we need, it proceeds back to the DEALLOC_MP_FULL state.
If it's unable to fill in a buffer, (encounters a hole, etc.)
it tries to find another non-zero block pointer. If they're all
zero, it lowers the height and returns to the DEALLOC_MP_LOWER
state. If it finds a good non-null pointer, it loops around and
reads it in, while keeping the metapath in lock-step with the
pointers it examines.

The state machine runs until the truncation request is
satisfied. Then any transactions are ended, the quota and
statfs data are updated, and the function is complete.

Helper function metaptr1 was introduced to be an easy way to
determine the start of a buffer_head's indirect pointers.

Helper function lookup_mp_height was introduced to find a
metapath index and read in the buffer that corresponds to it.
In this way, function lookup_metapath becomes a simple loop to
call it for every height.

Helper function fillup_metapath is similar to lookup_metapath
except it can do partial lookups. If the state machine
backed up multiple levels (like 2999 wrapping to 3000) it
needs to find out the next starting point and start issuing
metadata reads at that point.

Helper function hptrs is a shortcut to determine how many
pointers should be expected in a buffer. Height 0 is the dinode
which has fewer pointers than the others.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 2fcf5cc3 16-Dec-2016 Bob Peterson <rpeterso@redhat.com>

GFS2: Limit number of transaction blocks requested for truncates

This patch limits the number of transaction blocks requested during
file truncates. If we have very large multi-terabyte files, and want
to delete or truncate them, they might span so many resource groups
that we overflow the journal blocks, and cause an assert failure.
By limiting the number of blocks in the transaction, we prevent this
overflow and give other running processes time to do transactions.

The limiting factor I chose is sd_log_thresh2 which is currently
set to 4/5ths of the journal. This same ratio is used in function
gfs2_ail_flush_reqd to determine when a log flush is required.
If we make the maximum value less than this, we can get into a
infinite hang whereby the log stops moving because the number of
used blocks is less than the threshold and the iterative loop
needs more, but since we're under the threshold, the log daemon
never starts any IO on the log.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 078cd827 14-Sep-2016 Deepa Dinamani <deepa.kernel@gmail.com>

fs: Replace CURRENT_TIME with current_time() for inode timestamps

CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 47a9a527 01-Aug-2016 Fabian Frederick <fabf@skynet.be>

GFS2: use BIT() macro

Replace 1 << value shift by more explicit BIT() macro

Also fixes two bare unsigned definitions:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+ unsigned hsize = BIT(ip->i_depth);

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# 70246286 19-Jul-2016 Christoph Hellwig <hch@lst.de>

block: get rid of bio_rw and READA

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# dfec8a14 05-Jun-2016 Mike Christie <mchristi@redhat.com>

fs: have ll_rw_block users pass in op and flags separately

This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 2a222ca9 05-Jun-2016 Mike Christie <mchristi@redhat.com>

fs: have submit_bh users pass in op and flags separately

This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>


# 09cbfeaf 01-Apr-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a097dc7e 16-Jul-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Make rgrp reservations part of the gfs2_inode structure

Before this patch, multi-block reservation structures were allocated
from a special slab. This patch folds the structure into the gfs2_inode
structure. The disadvantage is that the gfs2_inode needs more memory,
even when a file is opened read-only. The advantages are: (a) we don't
need the special slab and the extra time it takes to allocate and
deallocate from it. (b) we no longer need to worry that the structure
exists for things like quota management. (c) This also allows us to
remove the calls to get_write_access and put_write_access since we
know the structure will exist.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b54e9a0b 26-Oct-2015 Bob Peterson <rpeterso@redhat.com>

GFS2: Extract quota data from reservations structure (revert 5407e24)

This patch basically reverts the majority of patch 5407e24.
That patch eliminated the gfs2_qadata structure in favor of just
using the reservations structure. The problem with doing that is that
it increases the size of the reservations structure. That is not an
issue until it comes time to fold the reservations structure into the
inode in memory so we know it's always there. By separating out the
quota structure again, we aren't punishing the non-quota users by
making all the inodes bigger, requiring more slab space. This patch
creates a new slab area to allocate the quota stuff so it's managed
a little more sanely.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>


# b8fbf471 17-Mar-2015 Abhi Das <adas@redhat.com>

gfs2: perform quota checks against allocation parameters

Use struct gfs2_alloc_parms as an argument to gfs2_quota_check()
and gfs2_quota_lock_check() to check for quota violations while
accounting for the new blocks requested by the current operation
in ap->target.

Previously, the number of new blocks requested during an operation
were not accounted for during quota_check and would allow these
operations to exceed quota. This was not very apparent since most
operations allocated only 1 block at a time and quotas would get
violated in the next operation. i.e. quota excess would only be by
1 block or so. With fallocate, (where we allocate a bunch of blocks
at once) the quota excess is non-trivial and is addressed by this
patch.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>


# b650738c 06-Aug-2014 Bob Peterson <rpeterso@redhat.com>

GFS2: Change maxlen variables to size_t

This patch changes some variables (especially maxlen in function
gfs2_block_map) from unsigned int to size_t. We need 64-bit arithmetic
for very large files (e.g. 1PB) where the variables otherwise get
shifted to all 0's.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c62baf65 14-May-2014 Fabian Frederick <fabf@skynet.be>

GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes

Fix 2 typos and move one definition which was between function
comments and function definition (yet another kernel-doc warning)

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b50f227b 03-Mar-2014 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up journal extent mapping

This patch fixes a long standing issue in mapping the journal
extents. Most journals will consist of only a single extent,
and although the cache took account of that by merging extents,
it did not actually map large extents, but instead was doing a
block by block mapping. Since the journal was only being mapped
on mount, this was not normally noticeable.

With the updated code, it is now possible to use the same extent
mapping system during journal recovery (which will be added in a
later patch). This will allow checking of the integrity of the
journal before any reply of the journal content is attempted. For
this reason the code is moving to bmap.c, since it will be used
more widely in due course.

An exercise left for the reader is to compare the new function
gfs2_map_journal_extents() with gfs2_write_alloc_required()

Additionally, should there be a failure, the error reporting is
also updated to show more detail about what went wrong.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7b9cff46 02-Oct-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add allocation parameters structure

This patch adds a structure to contain allocation parameters with
the intention of future expansion of this structure. The idea is
that we should be able to add more information about the allocation
in the future in order to allow the allocator to make a better job
of placing the requests on-disk.

There is no functional difference from applying this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# af5c2697 26-Sep-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up reservation removal

The reservation for an inode should be cleared when it is truncated so
that we can start again at a different offset for future allocations.
We could try and do better than that, by resetting the search based on
where the truncation started from, but this is only a first step.

In addition, there are three callers of gfs2_rs_delete() but only one
of those should really be testing the value of i_writecount. While
we get away with that in the other cases currently, I think it would
be better if we made that test specific to the one case which
requires it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7caef267 12-Sep-2013 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

truncate: drop 'oldsize' truncate_pagecache() parameter

truncate_pagecache() doesn't care about old size since commit
cedabed49b39 ("vfs: Fix vmtruncate() regression"). Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a01aedfe 26-Jun-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Reserve journal space for quota change in do_grow

If a GFS2 file system is mounted with quotas and a file is grown
in such a way that its free blocks for the allocation are represented
in a secondary bitmap, GFS2 ran out of blocks in the transaction.
That resulted in "fatal: assertion "tr->tr_num_buf <= tr->tr_blocks".
This patch reserves extra blocks for the quota change so the
transaction has enough space.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2b3dcf35 28-May-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Increase i_writecount during gfs2_setattr_size

This patch calls get_write_access in a few functions. This
merely increases inode->i_writecount for the duration of the function.
That will ensure that any file closes won't delete the inode's
multi-block reservation while the function is running.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 20095218 13-Mar-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Remove vestigial parameter ip from function rs_deltree

The functions that delete block reservations from the rgrp block
reservations rbtree no longer use the ip parameter. This patch
eliminates the parameter.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f4108a60 31-Jan-2013 Eric W. Biederman <ebiederm@xmission.com>

gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE

Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
so the constants may be well typed.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# d2b47cfb 31-Jan-2013 Bob Peterson <rpeterso@redhat.com>

GFS2: Get a block reservation before resizing a file

This patch allocates a block reservation structure before growing
or shrinking a file. Without this structure, the grow or shink code
can reference the bad pointer.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 45138990 28-Jan-2013 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use ->writepages for ordered writes

Instead of using a list of buffers to write ahead of the journal
flush, this now uses a list of inodes and calls ->writepages
via filemap_fdatawrite() in order to achieve the same thing. For
most use cases this results in a shorter ordered write list,
as well as much larger i/os being issued.

The ordered write list is sorted by inode number before writing
in order to retain the disk block ordering between inodes as
per the previous code.

The previous ordered write code used to conflict in its assumptions
about how to write out the disk blocks with mpage_writepages()
so that with this updated version we can also use mpage_writepages()
for GFS2's ordered write, writepages implementation. So we will
also send larger i/os from writeback too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 350a9b0a 13-Dec-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Split gfs2_trans_add_bh() into two

There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fa731fc4 13-Nov-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix truncation of journaled data files

This patch fixes an issue relating to not having enough revokes
available when truncating journaled data files. In order to ensure
that we do no run out, the truncation is broken into separate pieces
if it is large enough.

Tested using fsx on a journaled data file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9dbe9610 31-Oct-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add Orlov allocator

Just like ext3, this works on the root directory and any directory
with the +T flag set. Also, just like ext3, any subdirectory created
in one of the just mentioned cases will be allocated to a random
resource group (GFS2 equivalent of a block group).

If you are creating a set of directories, each of which will contain a
job running on a different node, then by setting +T on the parent
directory before creating the subdirectories, each will land up in a
different resource group, and thus resource group contention between
nodes will be kept to a minimum.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4a993fb1 31-Jul-2012 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add structure to contain rgrp, bitmap, offset tuple

This patch introduces a new structure, gfs2_rbm, which is a
tuple of a resource group, a bitmap within the resource group
and an offset within that bitmap. This is designed to make
manipulating these sets of variables easier. There is also a
new helper function which converts this representation back
to a disk block address.

In addition, the rbtree nodes which are used for the reservations
were not being correctly initialised, which is now fixed. Also,
the tracing was not passing through the inode where it should
have been. That is mostly fixed aside from one corner case. This
needs to be revisited since there can also be a NULL rgrp in
some cases which results in the device being incorrect in the
trace.

This is intended to be the first step towards cleaning up some
of the allocation code, and some further bug fixes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8e2e0047 19-Jul-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Reduce file fragmentation

This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
resulting improved on disk layout greatly speeds up operations in cases
which would have resulted in interlaced allocation of blocks previously.
A typical example of this is 10 parallel dd processes, each writing to a
file in a common dirctory.

The implementation uses an rbtree of reservations attached to each
resource group (and each inode).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5407e242 18-May-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Fold quota data into the reservations struct

This patch moves the ancillary quota data structures into the
block reservations structure. This saves GFS2 some time and
effort in allocating and deallocating the qadata structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f2f9c812 10-May-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer

It turns out that the "new" parameter to function gfs2_meta_indirect_buffer
was always being passed in as zero. Therefore, this patch eliminates it
and simplifies the function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2f7ee358 12-Apr-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Use variable rather than qa to determine if unstuff necessary

In the future, the qadata structure will be eliminated and merged
back in with the block reservation structure, after we extend the
lifespan of that. This patch is a step forward in eliminating the
qadata structure. It adds a variable to the do_grow function to
determine when unstuffing is necessary, and has been done.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5e2f7d61 04-Apr-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Make sure rindex is uptodate before starting transactions

This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 220cca2a 19-Mar-2012 Bob Peterson <rpeterso@redhat.com>

GFS2: Change truncate page allocation to be GFP_NOFS

This patch changes the page allocation in gfs2_block_truncate_page
and two others to GFP_NOFS to avoid deadlock in low-memory conditions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 564e12b1 21-Nov-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: decouple quota allocations from block allocations

This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 6e87ed0f 18-Nov-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: move toward a generic multi-block allocator

This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3c5d785a 14-Nov-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: combine gfs2_alloc_block and gfs2_alloc_di

GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically
the same things, with a few exceptions. This patch combines
the two functions into a slightly more generic gfs2_alloc_block.
Having one centralized block allocation function will reduce
code redundancy and make it easier to implement multi-block
reservations to reduce file fragmentation in the future.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 87654896 08-Nov-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: More automated code analysis fixes

A potentially uninitialised variable, some unreachable code,
and the main part of this, fixing the error path in the
unlink function.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b99b98dc 21-Sep-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move readahead of metadata during deallocation into its own function

Move the recently added readahead of the indirect pointer
tree during deallocation into its own function in order
that we can use it elsewhere in the future. Also this
fixes the resetting of the "first" variable in the
original patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 64dd153c 12-Sep-2011 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: rewrite fallocate code to write blocks directly

GFS2's fallocate code currently goes through the page cache. Since it's only
writing to the end of the file or to holes in it, it doesn't need to, and it
was causing issues on low memory environments. This patch pulls in some of
Steve's block allocation work, and uses it to simply allocate the blocks for
the file, and zero them out at allocation time. It provides a slight
performance increase, and it dramatically simplifies the code.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bd5437a7 15-Sep-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: speed up delete/unlink performance for large files

This patch improves the performance of delete/unlink
operations in a GFS2 file system where the files are large
by adding a layer of metadata read-ahead for indirect blocks.
Mileage will vary, but on my system, deleting an 8.6G file
dropped from 22 seconds to about 4.5 seconds.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 70b0c365 02-Sep-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Use cached rgrp in gfs2_rlist_add()

Each block which is deallocated, requires a call to gfs2_rlist_add()
and each of those calls was calling gfs2_blk2rgrpd() in order to
figure out which rgrp the block belonged in. This can be speeded up
by making use of the rgrp cached in the inode. We also reset this
cached rgrp in case the block has changed rgrp. This should provide
a big reduction in gfs2_blk2rgrpd() calls during deallocation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d56fa8a1 01-Sep-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Call do_strip() directly from recursive_scan()

The recursive_scan() function only ever takes a single "bc"
argument, so we might as well just call do_strip() directly
from resource_scan() rather than pass it in as an argument.

Also the "data" argument is always a struct strip_mine, so
we can pass that in, rather than using a void pointer.

This also moves do_strip() ahead of recursive_scan() so that
we don't need to add a prototype.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8339ee54 31-Aug-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Make resource groups "append only" during life of fs

Since we have ruled out supporting online filesystem shrink,
it is possible to make the resource group list append only
during the life of a super block. This gives several benefits:

Firstly, we only need to read new rindex elements as they are added
rather than needing to reread the whole rindex file each time one
element is added.

Secondly, the rindex glock can be held for much shorter periods of
time, and is completely removed from the fast path for allocations.
The lock is taken in shared mode only when updating the resource
groups when the first allocation occurs, and after a grow has
taken place.

Thirdly, this results in a reduction in code size, and everything
gets a lot simpler to understand in this area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 562c72aa5 24-Jun-2011 Christoph Hellwig <hch@infradead.org>

fs: move inode_dio_wait calls into ->setattr

Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 46fcb2ed 23-Jun-2011 Eric Sandeen <sandeen@redhat.com>

GFS2: combine duplicated block freeing routines

__gfs2_free_data and __gfs2_free_meta are almost identical, and
can be trivially combined.

[This is as per Eric's original patch minus gfs2_free_data() which had
no callers left and plus the conversion of the bmap.c calls to these
functions. All in all, a nice clean up]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 6d3117b4 21-May-2011 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Wipe directory hash table metadata when deallocating a directory

The deallocation code for directories in GFS2 is largely divided into
two parts. The first part deallocates any directory leaf blocks and
marks the directory as being a regular file when that is complete. The
second stage was identical to deallocating regular files.

Regular files have their data blocks in a different
address space to directories, and thus what would have been normal data
blocks in a regular file (the hash table in a GFS2 directory) were
deallocated correctly. However, a reference to these blocks was left in the
journal (assuming of course that some previous activity had resulted in
those blocks being in the journal or ail list).

This patch uses the i_depth as a test of whether the inode is an
exhash directory (we cannot test the inode type as that has already
been changed to a regular file at this stage in deallocation)

The original issue was reported by Chris Hertel as an issue he encountered
running bonnie++

Reported-by: Christopher R. Hertel <crh@samba.org>
Cc: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 4c16c36a 23-Feb-2011 Bob Peterson <rpeterso@redhat.com>

GFS2: deallocation performance patch

This patch is a performance improvement to GFS2's dealloc code.
Rather than update the quota file and statfs file for every
single block that's stripped off in unlink function do_strip,
this patch keeps track and updates them once for every layer
that's stripped. This is done entirely inside the existing
transaction, so there should be no risk of corruption.
The other functions that deallocate blocks will be unaffected
because they are using wrapper functions that do the same
thing that they do today.

I tested this code on my roth cluster by creating 200
files in a directory, each of which is 100MB, then on
four nodes, I simultaneously deleted the files, thus competing
for GFS2 resources (but different files). The commands
I used were:

[root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
[root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done

The performance increase was significant:

roth-01 roth-02 roth-03 roth-05
--------- --------- --------- ---------
old: real 0m34.027 0m25.021s 0m23.906s 0m35.646s
new: real 0m22.379s 0m24.362s 0m24.133s 0m18.562s

Total time spent deleting:
old: 118.6s
new: 89.4

For this particular case, this showed a 25% performance increase for
GFS2 unlinks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e06dfc49 30-Nov-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix uninitialised error value in previous patch

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 086d8334 23-Nov-2010 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: fix recursive locking during rindex truncates

When you truncate the rindex file, you need to avoid calling gfs2_rindex_hold,
since you already hold it. However, if you haven't already read in the
resource groups, you need to do that.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bf97b673 27-Sep-2010 Benjamin Marzinski <bmarzins@redhat.com>

GFS2: reserve more blocks for transactions

Some of the functions in GFS2 were not reserving space in the transaction for
the resource group header and the resource groups bitblocks that get added
when you do allocation. GFS2 now makes sure to reserve space for the
resource group header and either all the bitblocks in the resource group, or
one for each block that it may allocate, whichever is smaller using the new
gfs2_rg_blocks() inline function.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a2e0f799 11-Aug-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Remove i_disksize

With the update of the truncate code, ip->i_disksize and
inode->i_size are merely copies of each other. This means
we can remove ip->i_disksize and use inode->i_size exclusively
reducing the size of a GFS2 inode by 8 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ff8f33c8 11-Aug-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: New truncate sequence

This updates GFS2's truncate code to use the new truncate
sequence correctly. This is a stepping stone to being
able to remove ip->i_disksize in favour of using i_size
everywhere now that the two sizes are always identical.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>


# c639d5d8 30-Jul-2010 Abhijith Das <adas@redhat.com>

GFS2: Fix typo in stuffed file data copy handling

trunc_start() in bmap.c incorrectly uses sizeof(struct gfs2_inode) instead of
sizeof(struct gfs2_dinode).

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 461cb419 24-Jun-2010 Bob Peterson <rpeterso@redhat.com>

GFS2: Simplify gfs2_write_alloc_required

Function gfs2_write_alloc_required always returned zero as its
return code. Therefore, it doesn't need to return a return code
at all. Given that, we can use the return value to return whether
or not the dinode needs block allocations rather than passing
that value in, which in turn simplifies a bunch of error checking.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a8bf2bc2 24-Jun-2010 Bob Peterson <rpeterso@redhat.com>

GFS2: O_TRUNC not working on stuffed files across cluster

This patch replaces a statement that got dropped out by accident.
Without the patch, truncates on stuffed (very small) files cause
those files to have an unpredictable size.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5a0e3ad6 24-Mar-2010 Tejun Heo <tj@kernel.org>

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>


# 602c89d2 25-Mar-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up stuffed file copying

If the inode size was corrupt for stuffed files, it was possible
for the copying of data to overrun the block and/or page. This patch
checks for that condition so that this is no longer possible.

This is also preparation for the new truncate sequence patch which
requires the ability to have stuffed files with larger sizes than
(disk block size - sizeof(on disk inode)) with the restriction that
only the initial part of the file may be non-zero.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 07ccb7bf 12-Feb-2010 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix bmap allocation corner-case bug

This patch solves a corner case during allocation which occurs if both
metadata (indirect) and data blocks are required but there is an
obstacle in the filesystem (e.g. a resource group header or another
allocated block) such that when the allocation is requested only
enough blocks for the metadata are returned.

By changing the exit condition of this loop, we ensure that a
minimum of one data block will always be returned.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 63997775 12-Jun-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Add tracepoints

This patch adds the ability to trace various aspects of the GFS2
filesystem. The trace points are divided into three groups,
glocks, logging and bmap. These points have been chosen because
they allow inspection of the major internal functions of GFS2
and they are also generic enough that they are unlikely to need
any major changes as the filesystem evolves.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 40bc9a27 10-Jun-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Fix cache coherency between truncate and O_DIRECT read

If a page was partially zeroed as the result of a truncate, then it was
not being correctly marked dirty. This resulted in the deleted data
reappearing if the file was read back via direct I/O.

Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b1e71b06 22-May-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Clean up some file names

This patch renames the ops_*.c files which have no counterpart
without the ops_ prefix in order to shorten the name and make
it more readable. In addition, ops_address.h (which was very
small) is moved into inode.h and inode.h is cleaned up by
adding extern where required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 09010978 20-May-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Improve resource group error handling

This patch improves the error handling in the case where we
discover that the summary information in the resource group
doesn't match the bitmap information while in the process of
allocating blocks. Originally this resulted in a kernel bug,
but this patch changes that so that we return -EIO and print
some messages explaining what went wrong, and how to fix it.

We also remember locally not to try and allocate from the
same rgrp again, so that a subsequent allocation in a
different rgrp should succeed.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f057f6cd 12-Jan-2009 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Merge lock_dlm module into GFS2

This is the big patch that I've been working on for some time
now. There are many reasons for wanting to make this change
such as:
o Reducing overhead by eliminating duplicated fields between structures
o Simplifcation of the code (reduces the code size by a fair bit)
o The locking interface is now the DLM interface itself as proposed
some time ago.
o Fewer lookups of glocks when processing replies from the DLM
o Fewer memory allocations/deallocations for each glock
o Scope to do further optimisations in the future (but this patch is
more than big enough for now!)

Please note that (a) this patch relates to the lock_dlm module and
not the DLM itself, that is still a separate module; and (b) that
we retain the ability to build GFS2 as a standalone single node
filesystem with out requiring the DLM.

This patch needs a lot of testing, hence my keeping it I restarted
my -git tree after the last merge window. That way, this has the maximum
exposure before its merged. This is (modulo a few minor bug fixes) the
same patch that I've been posting on and off the the last three months
and its passed a number of different tests so far.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7ed122e4 10-Dec-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Streamline alloc calculations for writes

This patch removes some unused code, and make the calculation
of the number of blocks required conditional in order to reduce
the number of times this (potentially expensive) calculation
is done.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 383f01fb 04-Nov-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Banish struct gfs2_dinode_host

The final field in gfs2_dinode_host was the i_flags field. Thats
renamed to i_diskflags in order to avoid confusion with the existing
inode flags, and moved into the inode proper at a suitable location
to avoid creating a "hole".

At that point struct gfs2_dinode_host is no longer needed and as
promised (quite some time ago!) it can now be removed completely.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c9e98886 04-Nov-2008 Steven Whitehouse <swhiteho@redhat.com>

GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize

This patch moved the i_size field from the gfs2_dinode_host and
following the ext3 convention renames it i_disksize.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5af4e7a0 23-Jun-2008 Benjamin Marzinski <bmarzins@redhat.com>

[GFS2] fix gfs2 block allocation (cleaned up)

This patch fixes bz 450641.

This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d82661d9 10-Mar-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Streamline quota lock/check for no-quota case

This patch streamlines the quota checking in the "no quota" case by
making the check inline in the calling function, thus reducing the
number of function calls. Eventually we might be able to remove the
checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but
currently we can't as there are a very few places in the code which need
to call these functions directly still.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Abhijith Das <adas@redhat.com>


# 182fe5ab 03-Mar-2008 Cyrill Gorcunov <gorcunov@gmail.com>

[GFS2] possible null pointer dereference fixup

gfs2_alloc_get may fail so we have to check it to prevent
NULL pointer dereference.

Signed-off-by: Cyrill Gorcunov <gorcunov@gamil.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9b8c81d1 22-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Allow bmap to allocate extents

We've supported mapping of extents when no block allocation is required
for some time. This patch extends that to mapping of extents when an
allocation has been requested. In that case we try to allocate as many
blocks as are requested, but we might return fewer in case there is
something preventing us from returning the complete amount (e.g. an
already allocated block is in the way).

Currently the only code path which can actually request multiple data
blocks in a single bmap call is the page_mkwrite path and even then it
only happens if there are multiple blocks per page. What this patch does
do however, is merge the allocation requests for metadata (growing the
metadata tree in either height or depth) with the allocation of the data
blocks in the case that both are needed. This results in lower overheads
even in the single block allocation case.

The one thing which we can't handle here at the moment is unstuffing. I
would like to be able to do that, but the problem which arises is that
in order to unstuff one has to get a locked page from the page cache
which results in locking problems in the (usual) case that the caller is
holding the page lock on the page it wishes to map. So that case will
have to be addressed in future patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e23159d2 12-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Get inode buffer only once per block map call

In the case that we needed to grow the height of the metadata tree
we were looking up the inode buffer and then brelse()ing it despite
the fact that it is needed later in the block map process.

This patch ensures that we look up the inode's buffer once and only
once during the block map process.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 77658aad 12-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Eliminate (almost) duplicate field from gfs2_inode

The blocks counter is almost a duplicate of the i_blocks
field in the VFS inode. The only difference is that i_blocks
can be only 32bits long for 32bit arch without large single file
support. Since GFS2 doesn't handle the non-large single file
case (for 32 bit anyway) this adds a new config dependency on
64BIT || LSF. This has always been the case, however we've never
explicitly said so before.

Even if we do add support for the non-LSF case, we will still
not require this field to be duplicated since we will not be
able to access oversized files anyway.

So the net result of all this is that we shave 8 bytes from a gfs2_inode
and get our config deps correct.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 30cbf189 08-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add a function to interate over an extent

This adds a function (currently the only use is during mapping
of already allocated blocks, but watch this space) which iterates
over a number of pointers in a block and returns the extent length.

If the initial pointer is 0 (i.e. unallocated) it will return the
number of unallocated blocks in the extent. If the initial pointer
is allocated, then it returns the number of contiguously allocated
blocks in the extent.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c85a665f 11-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] The case of the missing asterisk

A dereference was forgotten. This adds it back correctly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b45e41d7 06-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add extent allocation to block allocator

Rather than having to allocate a single block at a time, this patch
allows the block allocator to allocate an extent. Since there is
no difference (so far as the block allocator is concerned) between
data blocks and indirect blocks, it is posible to allocate a single
extent and for the caller to unrevoke just the blocks required
for indirect blocks.

Currently the only bit of GFS2 to make use of this feature is the
build height function. The intention is that gfs2_block_map will
be changed to make use of this feature in future patches.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1639431a 01-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Merge gfs2_alloc_meta and gfs2_alloc_data

Thanks to the preceeding patches, the only difference between
these two functions is their name. We can thus merge them
and call the new function gfs2_alloc_block to reflect the
fact that it can allocate either kind of block.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5731be53 01-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update gfs2_trans_add_unrevoke to accept extents

By adding an extra argument to gfs2_trans_add_unrevoke we can now
specify an extent length of blocks to unrevoke. This means that
we only need to make one pass through the list for each extent
rather than each block. Currently the only extent length which
is used is 1, but that will change in the future.

Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta
since its the only difference between this and gfs2_alloc_data
which is left. This will allow a future patch to merge these
two functions into one (i.e. one call to allocate both data
and metadata in a single extent in the future).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ce276b06 06-Feb-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Reduce inode size by merging fields

There were three fields being used to keep track of the location
of the most recently allocated block for each inode. These have
been merged into a single field in order to better keep the
data and metadata for an inode close on disk, and also to reduce
the space required for storage.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dbac6710 29-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Introduce array of buffers to struct metapath

The reason for doing this is to allow all the block mapping code
to share the same array. As a result we can remove two arguments
from lookup_metapath since they are now returned via the array.

We also add a function to drop all refs to buffer heads when we
are done with the metapath. The build_height function shares the
struct metapath, but currently still frees its own buffers, and
this will change in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 11707ea0 28-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Move part of gfs2_block_map into a separate function

This is required to enable future changes to the block
mapping code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7eabb77e 28-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] Misc fixups

This patch contains two small fixups that didn't fit elsewhere.
They are: (1) get rid of temp variable in find_metapath.
(2) Remove vestigial "ret" variable from gfs2_writepage_common.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fe6c991c 28-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] Get rid of unneeded parameter in gfs2_rlist_alloc

This patch removed the unnecessary parameter from function
gfs2_rlist_alloc. The parameter was always passed in as 0.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ecc30c79 28-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Streamline indirect pointer tree height calculation

This patch improves the calculation of the tree height in order to reduce
the number of operations which are carried out on each call to gfs2_block_map.
In the common case, we now make a single comparison, rather than calculating
the required tree height from scratch each time. Also in the case that the
tree does need some extra height, we start from the current height rather from
zero when we work out what the new height ought to be.

In addition the di_height field is moved into the inode proper and reduced
in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 941e6d7d 28-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Speed up gfs2_write_alloc_required, deprecate gfs2_extent_map

This patch removes the call to gfs2_extent_map from gfs2_write_alloc_required,
instead we call gfs2_block_map directly. This results in fewer overall calls
to gfs2_block_map in the multi-block case.

Also, gfs2_extent_map is marked as deprecated so that people know that its
going away as soon as all the callers have been converted.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# eebd2aa3 04-Feb-2008 Christoph Lameter <clameter@sgi.com>

Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.

zero_user_segment(page, start, end)

Same for a single segment.

zero_user(page, start, length)

Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1af53572 16-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix write alloc required shortcut calculation

The comparison was being made against the wrong quantity.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 05220535 11-Jan-2008 Bob Peterson <rpeterso@redhat.com>

[GFS2] gfs2_alloc_required performance

This is a small I/O performance enhancement to gfs2. (Actually, it is a rework of
an earlier version I got wrong). The idea here is to check if the write extends
past the last block in the file. If so, the function can save itself a lot of
time and trouble because it knows an allocate will be required. Benchmarks like
iozone should see better performance.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 6dbd8224 10-Jan-2008 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Reduce inode size by moving i_alloc out of line

It is possible to reduce the size of GFS2 inodes by taking the i_alloc
structure out of the gfs2_inode. This patch allocates the i_alloc
structure whenever its needed, and frees it afterward. This decreases
the amount of low memory we use at the expense of requiring a memory
allocation for each page or partial page that we write. A quick test
with postmark shows that the overhead is not measurable and I also note
that OCFS2 use the same approach.

In the future I'd like to solve the problem by shrinking down the size
of the members of the i_alloc structure, but for now, this reduces the
immediate problem of using too much low-memory on x86 and doesn't add
too much overhead.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b0d5fd30 11-Dec-2007 Bob Peterson <rpeterso@redhat.com>

[GFS2] Only fetch the dinode once in block_map

Function gfs2_block_map was often looking up the disk inode twice.
This optimizes it so that only does it once.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e9e1ef2b 10-Dec-2007 Bob Peterson <rpeterso@redhat.com>

[GFS2] Remove function gfs2_get_block

This patch is just a cleanup. Function gfs2_get_block() just calls
function gfs2_block_map reversing the last two parameters. By
reversing the parameters, gfs2_block_map() may be called directly
and function gfs2_get_block may be eliminated altogether.
Since this function is done for every block operation,
this streamlines the code and makes it a little bit more efficient.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bf36a713 17-Oct-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add gfs2_is_writeback()

This adds a function "gfs2_is_writeback()" along the lines of the
existing "gfs2_is_jdata()" in order to clean up the code and make
the various tests for the inode mode more obvious. It also fixes
the PageChecked() logic where we were resetting the flag too early
in the case of an error path.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 8475487b 02-Sep-2007 Bob Peterson <rpeterso@redhat.com>

[GFS2] Fix ordering of dirty/journal for ordered buffer unstuffing

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# eaf96527 27-Aug-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Don't mark jdata dirty in gfs2_unstuffer_page()

Journaled data is marked dirty by gfs2_unpin and should not be marked
dirty here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a13b8c5f 20-Aug-2007 Wendy Cheng <wcheng@redhat.com>

[GFS2] Reduce truncate IO traffic

Current GFS2 setattr call unconditionally invokes do_shrink even the
requested size and actual file size are equal. This has generated large
amount of extra IOs found during NFS benchmark runs. This patch moves
the relevant logic out of shrink code path. Since setattr is a system
call, the time stamps update is still required.

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1875f2f3 25-Jun-2007 S. Wendy Cheng <wcheng@redhat.com>

[GFS2] Fix gfs2_block_truncate_page err return

Code segment inside gfs2_block_truncate_page() doesn't set the return
code correctly. This causes NFSD erroneously returns EIO back to client
with setattr procedure call (truncate error).

Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4bd91ba1 05-Jun-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Add nanosecond timestamp feature

This adds a nanosecond timestamp feature to the GFS2 filesystem. Due
to the way that the on-disk format works, older filesystems will just
appear to have this field set to zero. When mounted by an older version
of GFS2, the filesystem will simply ignore the extra fields so that
it will again appear to have whole second resolution, so that its
trivially backward compatible.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bb8d8a6f 01-Jun-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix sign problem in quota/statfs and cleanup _host structures

This patch fixes some sign issues which were accidentally introduced
into the quota & statfs code during the endianess annotation process.
Also included is a general clean up which moves all of the _host
structures out of gfs2_ondisk.h (where they should not have been to
start with) and into the places where they are actually used (often only
one place). Also those _host structures which are not required any more
are removed entirely (which is the eventual plan for all of them).

The conversion routines from ondisk.c are also moved into the places
where they are actually used, which for almost every one, was just one
single place, so all those are now static functions. This also cleans up
the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__.

The net result is a reduction of about 100 lines of code, many functions
now marked static plus the bug fixes as mentioned above. For good
measure I ran the code through sparse after making these changes to
check that there are no warnings generated.

This fixes Red Hat bz #239686

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# dbb7cae2 15-May-2007 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Clean up inode number handling

This patch cleans up the inode number handling code. The main difference
is that instead of looking up the inodes using a struct gfs2_inum_host
we now use just the no_addr member of this structure. The tests relating
to no_formal_ino can then be done by the calling code. This has
advantages in that we want to do different things in different code
paths if the no_formal_ino doesn't match. In the NFS patch we want to
return -ESTALE, but in the ->lookup() path, its a bug in the fs if the
no_formal_ino doesn't match and thus we can withdraw in this case.

In order to later fix bz #201012, we need to be able to look up an inode
without knowing no_formal_ino, as the only information that is known to
us is the on-disk location of the inode in question.

This patch will also help us to fix bz #236099 at a later date by
cleaning up a lot of the code in that area.

There are no user visible changes as a result of this patch and there
are no changes to the on-disk format either.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 0507ecf5 10-May-2007 Nate Diller <nate.diller@gmail.com>

[GFS2] use zero_user_page

Use zero_user_page() instead of open-coding it.

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# cd354f1a 14-Feb-2007 Tim Schmielau <tim@physik3.uni-rostock.de>

[PATCH] remove many unneeded #includes of sched.h

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ddfe0627 18-Jan-2007 Eric Sandeen <sandeen@redhat.com>

[GFS2] use CURRENT_TIME_SEC instead of get_seconds in gfs2

I was looking something else up and came across this...

I don't honestly have a good reason to change it other than to make it
like every other Linux filesystem in this regard. ;-) It doesn't
functionally change anything, but makes some lines shorter. :)

I'm also curious; why does gfs2 have 64-bits of on-disk timestamps, but
not in timespec_t format, and only stores second resolutions? Seems like
you're halfway to sub-second resolutions already.

I suppose if that gets implemented then all of the below should
instead be CURRENT_TIME not CURRENT_TIME_SEC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 4cf1ed81 15-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy up bmap & fix boundary bug

This moves the locking for bmap into the bmap function itself
rather than using a wrapper function. It also fixes a bug where
the boundary flag was set on the wrong bh. Also the flags on
the mapped bh are reset earlier in the function to ensure that
they are 100% correct on the error path.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 9e2dbdac 08-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove gfs2_inode_attr_in

This function wasn't really doing the right thing. There was no need
to update the inode size at this point and the updating of the
i_blocks field has now been moved to the places where di_blocks is
updated. A result of this patch and some those preceeding it is that
unlocking a glock is now a much more efficient process, since there
is no longer any requirement to copy data from the gfs2 inode into
the vfs inode at this point.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 1a7b1eed 01-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Shrink gfs2_inode (6) - di_atime/di_mtime/di_ctime

Remove the di_[amc]time fields and use inode->i_[amc]time
fields instead. This saves 24 bytes from the gfs2_inode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 2933f925 01-Nov-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Shrink gfs2_inode (4) - di_uid/di_gid

Remove duplicate di_uid/di_gid fields in favour of using
inode->i_uid/inode->i_gid instead. This saves 8 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b60623c2 31-Oct-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Shrink gfs2_inode (3) - di_mode

This removes the duplicate di_mode field in favour of using the
inode->i_mode field. This saves 4 bytes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 539e5d6b 31-Oct-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Change argument of gfs2_dinode_out

Everywhere this was called, a struct gfs2_inode was available,
but despite that, it was always called with a struct gfs2_dinode
as an argument. By making this change it paves the way to start
eliminating fields duplicated between the kernel's struct inode
and the struct gfs2_dinode.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b44b84d7 14-Oct-2006 Al Viro <viro@zeniv.linux.org.uk>

[GFS2] gfs2 misc endianness annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 23591256 13-Oct-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix bmap to map extents properly

This fix means that bmap will map extents of the length requested
by the VFS rather than guessing at it, or just mapping one block
at a time. The other callers of gfs2_block_map are audited to ensure
they send the correct max extent lengths (i.e. set bh->b_size correctly).

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 48516ced 01-Oct-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove uneeded endian conversion

In many places GFS2 was calling the endian conversion routines
for an inode even when only a single field, or a few fields might
have changed. As a result we were copying lots of data needlessly.

This patch replaces those calls with conversion of just the
required fields in each case. This should be faster and easier
to understand. There are still other places which suffer from this
problem, but this is a start in the right direction.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 907b9bce 25-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2/DLM] Fix trailing whitespace

As per Andrew Morton's request, removed trailing whitespace.

Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7276b3b0 21-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy up meta_io code

Fix a bug in the directory reading code, where we might have dereferenced
a NULL pointer in case of OOM. Updated the directory code to use the new
& improved version of gfs2_meta_ra() which now returns the first block
that was being read. Previously it was releasing it requiring following
code to grab the block again at each point it was called.

Also turned off readahead on directory lookups since we are reading a
hash table, and therefore reading the entries in order is very
unlikely. Readahead is still used for all other calls to the
directory reading function (e.g. when growing the hash table).

Removed the DIO_START constant. Everywhere this was used, it was
used to unconditionally start i/o aside from a couple of places, so
I've removed it and made the couple of exceptions to this rule into
separate functions.

Also hunted through the other DIO flags and removed them as arguments
from functions which were always called with the same combination of
arguments.

Updated gfs2_meta_indirect_buffer to be a bit more efficient and
hopefully also be a bit easier to read.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7d308590 18-Sep-2006 Fabio Massimo Di Nitto <fabbione@ubuntu.com>

[GFS2] Export lm_interface to kernel headers


lm_interface.h has a few out of the tree clients such as GFS1
and userland tools.

Right now, these clients keeps a copy of the file in their build tree
that can go out of sync.

Move lm_interface.h to include/linux, export it to userland and
clean up fs/gfs2 to use the new location.

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f3b30912 18-Sep-2006 akpm@osdl.org <akpm@osdl.org>

[GFS2] inode-diet-eliminate-i_blksize-and-use-a-per-superblock-default-vs-gfs2

i_blksize got removed in -mm.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 7a6bbacb 18-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Map multiple blocks at once where possible

This is a tidy up of the GFS2 bmap code. The main change is that the
bh is passed to gfs2_block_map allowing the flags to be set directly
rather than having to repeat that code several times in ops_address.c.

At the same time, the extent mapping code from gfs2_extent_map has
been moved into gfs2_block_map. This allows all calls to gfs2_block_map
to map extents in the case that no allocation is taking place. As a
result reads and non-allocating writes should be faster. A quick test
with postmark appears to support this.

There is a limit on the number of blocks mapped in a single bmap
call in that it will only ever map blocks which are pointed to
from a single pointer block. So in other words, it will never try
to do additional i/o in order to satisfy read-ahead. The maximum
number of blocks is thus somewhat less than 512 (the GFS2 4k block
size minus the header divided by sizeof(u64)). I've further limited
the mapping of "normal" blocks to 32 blocks (to avoid extra work)
since readpages() will currently read a maximum of 32 blocks ahead (128k).

Some further work will probably be needed to set a suitable value
for DIO as well, but for now thats left at the maximum 512 (see
ops_address.c:gfs2_get_block_direct).

There is probably a lot more that can be done to improve bmap for GFS2,
but this is a good first step.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c5392124 05-Sep-2006 Jan Engelhardt <jengelh@linux01.gwdg.de>

[GFS2] More style changes

Remove redundant brackets

Signed-off-by: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# c2668711 04-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove a cast, tidy gfs2_inode_attr_in

The remains of the changes for Jan Engelhardt's third email. Remove
a cast and tidy up gfs2_inode_attr_in.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# cd915493 03-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Change all types to uX style

This makes all fixed size types have consistent names.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# a91ea69f 03-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Align all labels against LH side

This makes everything consistent.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 75d3b817 04-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy up bmap/inode code

As per Jan Engelhardt's third set of comments, this make various
code style changes and moves the structures from format.h into
super.c, which was the only place that format.h was actually used.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e9fc2aa0 01-Sep-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update copyright, tidy up incore.h

As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this
updates the copyright message to say "version" in full rather than
"v.2". Also incore.h has been updated to remove forward structure
declarations which are not required.

The gfs2_quota_lvb structure has now had endianess annotations added
to it. Also quota.c has been updated so that we now store the
lvb data locally in endian independant format to avoid needing
a structure in host endianess too. As a result the endianess
conversions are done as required at various points and thus the
conversion routines in lvb.[ch] are no longer required. I've
moved the one remaining constant in lvb.h thats used into lm.h
and removed the unused lvb.[ch].

I have not changed the HIF_ constants. That is left to a later patch
which I hope will unify the gh_flags and gh_iflags fields of the
struct gfs2_holder.

Cc: Jan Engelhardt <jengelh@linux01.gwdg.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# ba7f7290 26-Jul-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove page.[ch]

The remaining routines in page.c were all only used in one other
file, so they are now moved into the files where they are referenced
and made static. Thus page.[ch] are no longer required.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# f25ef0c1 26-Jul-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy gfs2_unstuffer_page

Tidy up gfs2_unstuffer_page by:

a) Moving it into bmap.c
b) Making it static
c) Calling it directly from gfs2_unstuff_dinode
d) Updating all callers of gfs2_unstuff_dinode due to one less
required argument.

It doesn't change the behaviour at all.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# feaa7bba 14-Jun-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Fix unlinked file handling

This patch fixes the way we have been dealing with unlinked,
but still open files. It removes all limits (other than memory
for inodes, as per every other filesystem) on numbers of these
which we can support on GFS2. It also means that (like other
fs) its the responsibility of the last process to close the file
to deallocate the storage, rather than the person who did the
unlinking. Note that with GFS2, those two events might take place
on different nodes.

Also there are a number of other changes:

o We use the Linux inode subsystem as it was intended to be
used, wrt allocating GFS2 inodes
o The Linux inode cache is now the point which we use for
local enforcement of only holding one copy of the inode in
core at once (previous to this we used the glock layer).
o We no longer use the unlinked "special" file. We just ignore it
completely. This makes unlinking more efficient.
o We now use the 4th block allocation state. The previously unused
state is used to track unlinked but still open inodes.
o gfs2_inoded is no longer needed
o Several fields are now no longer needed (and removed) from the in
core struct gfs2_inode
o Several fields are no longer needed (and removed) from the in core
superblock

There are a number of future possible optimisations and clean ups
which have been made possible by this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 3a8a9a10 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update copyright date to 2006

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# bd896801 18-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove semaphore.h from C files

We no longer use semaphores, everything has been converted to
mutex or rwsem, so we don't need to include this header any more.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# e90c01e1 11-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Reverse block order in build_height

The original code ordered the blocks allocated in the build_height
routine backwards causing excessive disk seeks during a read of the
metadata. This patch reverses the order to try and reduce disk seeks.

Example: A five level metadata tree, I = Inode, P = Pointers, D = Data

You need to read the blocks in the order:

I P5 P4 P3 P2 P1 D

in order to read a single data block. The new code now orders the blocks
in this way. The old code used to order them as:

I P1 P2 P3 P4 P5 D

requiring two extra seeks on average. Note that for files which are
grown by gradual extension rather than by truncate or by llseek/write
at a large offset, this doesn't apply. In the case of writing to a
file linearly, this routine will only be called upon to extend the
height of the tree by one block at a time, so the ordering is
determined by when its called rather than by the internals of the
routine itself. Optimising that part of the ordering is a much
harder problem.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# fd88de56 05-May-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Readpages support

This adds readpages support (and also corrects a small bug in
the readpage error path at the same time). Hopefully this will
improve performance by allowing GFS to submit larger lumps of
I/O at a time.

In order to simplify the setting of BH_Boundary, it currently gets
set when we hit the end of a indirect pointer block. There is
always a boundary at this point with the current allocation code.
It doesn't get all the boundaries right though, so there is still
room for improvement in this.

See comments in fs/gfs2/ops_address.c for further information about
readpages with GFS2.

Signed-off-by: Steven Whitehouse


# 56409abb 28-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Remove some unused code

Remove some of the unused code flagged up by Adrian Bunk.

Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse


# 08bc2dbc 28-Apr-2006 Adrian Bunk <bunk@stusta.de>

[GFS2] [-mm patch] fs/gfs2/: possible cleanups

This patch contains the following possible cleanups:
- make needlessly global code static
- #if 0 unused functions
- remove the following global function that was both unused and
unimplemented:
- super.c: gfs2_do_upgrade()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 61e085a8 24-Apr-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Tidy up dir code as per Christoph Hellwig's comments

1. Comment whitespace fix
2. Removed unused header files from dir.c
3. Split the gfs2_dir_get_buffer() function into two functions

Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 71b86f56 28-Mar-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Further updates to dir and logging code

This reduces the size of the directory code by about 3k and gets
readdir() to use the functions which were introduced in the previous
directory code update.

Two memory allocations are merged into one. Eliminates zeroing of some
buffers which were never used before they were initialised by
other data.

There is still scope for further improvement in the directory code.

On the logging side, a hand created mutex has been replaced by a
standard Linux mutex in the log allocation code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 5c676f6d 27-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Macros removal in gfs2.h

As suggested by Pekka Enberg <penberg@cs.helsinki.fi>.

The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h
The other macros are gone from gfs2.h as (although not requested
by Pekka Enberg) are a number of included header file which are now
included individually. The inode number comparison function is
now an inline function.

The DT2IF and IF2DT may be addressed in a future patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 568f4c96 26-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] 80 Column audit of GFS2

Requested by:
Prarit Bhargava <prarit@redhat.com>

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 18ec7d5c 08-Feb-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Make journaled data files identical to normal files on disk

This is a very large patch, with a few still to be resolved issues
so you might want to check out the previous head of the tree since
this is known to be unstable. Fixes for the various bugs will be
forthcoming shortly.

This patch removes the special data format which has been used
up till now for journaled data files. Directories still retain the
old format so that they will remain on disk compatible with earlier
releases. As a result you can now do the following with journaled
data files:

1) mmap them
2) export them over NFS
3) convert to/from normal files whenever you want to (the zero length
restriction is gone)

In addition the level at which GFS' locking is done has changed for all
files (since they all now use the page cache) such that the locking is
done at the page cache level rather than the level of the fs operations.
This should mean that things like loopback mounts and other things which
touch the page cache directly should now work.

Current known issues:

1. There is a lock mode inversion problem related to the resource
group hold function which needs to be resolved.
2. Any significant amount of I/O causes an oops with an offset of hex 320
(NULL pointer dereference) which appears to be related to a journaled data
buffer appearing on a list where it shouldn't be.
3. Direct I/O writes are disabled for the time being (will reappear later)
4. There is probably a deadlock between the page lock and GFS' locks under
certain combinations of mmap and fs operation I/O.
5. Issue relating to ref counting on internally used inodes causes a hang
on umount (discovered before this patch, and not fixed by it)
6. One part of the directory metadata is different from GFS1 and will need
to be resolved before next release.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 257f9b4e 31-Jan-2006 Steven Whitehouse <swhiteho@redhat.com>

[GFS2] Update truncate function (shrinking partial blocks)

Update the function in GFS2 which deals with truncation of
partial blocks. Some of the code is "borrowed" from ext3
since it appears to give a good model of how to do this
operation. The function is renamed gfs2_block_truncate_page
accordingly.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# aa6a85a9 24-Jan-2006 Steven Whitehouse <steve@men-an-tol.chygwyn.com>

[GFS2] Remove pointless argument relating to truncate

For some reason a function pointer was being passed through
the truncate code which only ever took one value. This removes
the function pointer and replaces it with a single call to
the function in question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# d4e9c4c3 18-Jan-2006 Steven Whitehouse <steve@chygwyn.com>

[GFS2] Add an additional argument to gfs2_trans_add_bh()

This adds an extra argument to gfs2_trans_add_bh() to indicate whether the
bh being added to the transaction is metadata or data. Its currently unused
since all existing callers set it to 1 (metadata) but following patches will
make use of it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# 666a2c53 18-Jan-2006 Steven Whitehouse <steve@chygwyn.com>

[GFS2] Remove unused code from various files

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>


# b3b94faa 16-Jan-2006 David Teigland <teigland@redhat.com>

[GFS2] The core of GFS2

This patch contains all the core files for GFS2.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>