#
e345b87b |
|
19-Dec-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix freeze consistency check in log_write_header Functions gfs2_freeze_super() and gfs2_thaw_super() are using the SDF_FROZEN flag to indicate when the filesystem is frozen, synchronized by sd_freeze_mutex. However, this doesn't prevent writes from happening between the point of calling thaw_super() and the point where the SDF_FROZEN flag is cleared, so the following assert can trigger in log_write_header(): gfs2_assert_withdraw(sdp, !test_bit(SDF_FROZEN, &sdp->sd_flags)); Fix that by checking for sb->s_writers.frozen != SB_FREEZE_COMPLETE in log_write_header() instead. To make sure that the filesystem-specific part of freezing happens before sb->s_writers.frozen is set to SB_FREEZE_COMPLETE, move that code from gfs2_freeze_locally() into gfs2_freeze_fs() and hook that up to the .freeze_fs operation. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
edd13270 |
|
17-Dec-2023 |
Kevin Hao <haokexin@gmail.com> |
gfs2: Use wait_event_freezable_timeout() for freezable kthread A freezable kernel thread can enter frozen state during freezing by either calling try_to_freeze() or using wait_event_freezable() and its variants. So for the following snippet of code in a kernel thread loop: try_to_freeze(); wait_event_interruptible_timeout(); We can change it to a simple wait_event_freezable_timeout() and then eliminate a function call. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
76e7211c |
|
17-Dec-2023 |
Kevin Hao <haokexin@gmail.com> |
gfs2: Add missing set_freezable() for freezable kthread The kernel thread function gfs2_logd() and gfs2_quotad() invoke the try_to_freeze() in its loop. But all the kernel threads are no-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
e0f1f021 |
|
20-Dec-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Lift withdraw check out of gfs2_ail1_empty Lift the check for the SDF_WITHDRAWING flag out of gfs2_ail1_empty() and into its callers. This is needed so that gfs2_flush_revokes() can drop the sd_log_lock spinlock before triggering a withdraw if necessary. Instead of checking for the SDF_WITHDRAWING flag, use gfs2_withdrawing(). Also, the low-level code triggering the delayed withdraw reports when there is a problem, so there is no need to report that again. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4d927b03 |
|
20-Dec-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawn This function checks whether the filesystem has been been marked to be withdrawn eventually or has been withdrawn already. Rename this function to avoid confusing code like checking for gfs2_withdrawing() when gfs2_withdrawn() has already returned true. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
47106428 |
|
20-Dec-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Minor gfs2_ail1_empty cleanup Change gfs2_ail1_empty() to return %true when the ail1 list is empty. Based on that, make the loop in empty_ail1_list() more obvious. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
fe0690f0 |
|
25-Aug-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Sanitize kthread stopping Immediately stop the logd and quotad kernel threads when a filesystem withdraw is detected: those threads aren't doing anything useful after a withdraw. (Depends on the extra logd and quotad task struct references held since commit 7a109f383fa3 ("gfs2: Fix asynchronous thread destruction").) In addition, check for kthread_should_stop() in the wait condition in gfs2_quotad() to stop immediately when kthread_stop() is called. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
db77789b |
|
11-Aug-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: journal flush threshold fixes and cleanup Commit f07b35202148 ("GFS2: Made logd daemon take into account log demand") changed gfs2_ail_flush_reqd() and gfs2_jrnl_flush_reqd() to take sd_log_blks_needed into account, but the checks in gfs2_log_commit() were not updated correspondingly. Once that is fixed, gfs2_jrnl_flush_reqd() and gfs2_ail_flush_reqd() can be used in gfs2_log_commit(). Make those two helpers available to gfs2_log_commit() by defining them above gfs2_log_commit(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b6b8f72a |
|
17-Aug-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix logd wakeup on I/O error When quotad detects an I/O error, it sets sd_log_error and then it wakes up logd to withdraw the filesystem. However, logd doesn't wake up when sd_log_error is set. Fix that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b74cd55a |
|
10-Aug-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: low-memory forced flush fixes First, function gfs2_ail_flush_reqd checks the SDF_FORCE_AIL_FLUSH flag to determine if an AIL flush should be forced in low-memory situations. However, it also immediately clears the flag, and when called repeatedly as in function gfs2_logd, the flag will be lost. Fix that by pulling the SDF_FORCE_AIL_FLUSH flag check out of gfs2_ail_flush_reqd. Second, function gfs2_writepages sets the SDF_FORCE_AIL_FLUSH flag whether or not enough pages were written. If enough pages could be written, flushing the AIL is unnecessary, though. Third, gfs2_writepages doesn't wake up logd after setting the SDF_FORCE_AIL_FLUSH flag, so it can take a long time for logd to react. It would be preferable to wake up logd, but that hurts the performance of some workloads and we don't quite understand why so far, so don't wake up logd so far. Fixes: b066a4eebd4f ("gfs2: forcibly flush ail to relieve memory pressure") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
6df373b0 |
|
17-Aug-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Switch to wait_event in gfs2_logd In gfs2_logd(), switch from an open-coded wait loop to wait_event_interruptible_timeout(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5432af15 |
|
18-Aug-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Replace sd_freeze_state with SDF_FROZEN flag Replace sd_freeze_state with a new SDF_FROZEN flag. There no longer is a need for indicating that a freeze is in progress (SDF_STARTING_FREEZE); we are now protecting the critical sections with the sd_freeze_mutex. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b77b4a48 |
|
14-Nov-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rework freeze / thaw logic So far, at mount time, gfs2 would take the freeze glock in shared mode and then immediately drop it again, turning it into a cached glock that can be reclaimed at any time. To freeze the filesystem cluster-wide, the node initiating the freeze would take the freeze glock in exclusive mode, which would cause the freeze glock's freeze_go_sync() callback to run on each node. There, gfs2 would freeze the filesystem and schedule gfs2_freeze_func() to run. gfs2_freeze_func() would re-acquire the freeze glock in shared mode, thaw the filesystem, and drop the freeze glock again. The initiating node would keep the freeze glock held in exclusive mode. To thaw the filesystem, the initiating node would drop the freeze glock again, which would allow gfs2_freeze_func() to resume on all nodes, leaving the filesystem in the thawed state. It turns out that in freeze_go_sync(), we cannot reliably and safely freeze the filesystem. This is primarily because the final unmount of a filesystem takes a write lock on the s_umount rw semaphore before calling into gfs2_put_super(), and freeze_go_sync() needs to call freeze_super() which also takes a write lock on the same semaphore, causing a deadlock. We could work around this by trying to take an active reference on the super block first, which would prevent unmount from running at the same time. But that can fail, and freeze_go_sync() isn't actually allowed to fail. To get around this, this patch changes the freeze glock locking scheme as follows: At mount time, each node takes the freeze glock in shared mode. To freeze a filesystem, the initiating node first freezes the filesystem locally and then drops and re-acquires the freeze glock in exclusive mode. All other nodes notice that there is contention on the freeze glock in their go_callback callbacks, and they schedule gfs2_freeze_func() to run. There, they freeze the filesystem locally and drop and re-acquire the freeze glock before re-thawing the filesystem. This is happening outside of the glock state engine, so there, we are allowed to fail. From a cluster point of view, taking and immediately dropping a glock is indistinguishable from taking the glock and only dropping it upon contention, so this new scheme is compatible with the old one. Thanks to Li Dong <lidong@vivo.com> for reporting a locking bug in gfs2_freeze_func() in a previous version of this commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
11551cf1 |
|
15-Dec-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
gfs2: replace obvious uses of b_page with b_folio These places just use b_page to get to the buffer's address_space. Link: https://lkml.kernel.org/r/20221215214402.3522366-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
95ecbd0f |
|
19-Jan-2023 |
Andreas Gruenbacher <agruenba@redhat.com> |
Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one" Commit b2b0a5e97855 switched from generic_writepages() to filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to replacing ->writepage() with ->writepages() and eventually eliminating the former. Function gfs2_ail1_start_one() is called from gfs2_log_flush(), our main function for flushing the filesystem log. Unfortunately, at least as implemented today, ->writepage() and ->writepages() are entirely different operations for journaled data inodes: while the former creates and submits transactions covering the data to be written, the latter flushes dirty buffers out to disk. With gfs2_ail1_start_one() now calling ->writepages(), we end up creating filesystem transactions while we are in the course of a log flush, which immediately deadlocks on the sdp->sd_log_flush_lock semaphore. Work around that by going back to how things used to work before commit b2b0a5e97855 for now; figuring out a superior solution will take time we don't have available right now. However ... Since the removal of generic_writepages() is imminent, open-code it here. We're already inside a blk_start_plug() ... blk_finish_plug() section here, so skip that part of the original generic_writepages(). This reverts commit b2b0a5e978552e348f85ad9c7568b630a5ede659. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de>
|
#
b2b0a5e9 |
|
22-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
gfs2: stop using generic_writepages in gfs2_ail1_start_one Use filemap_fdatawrite_wbc instead of generic_writepages in gfs2_ail1_start_one so that the functin can also cope with address_space operations that only implement ->writepages and to properly account for cgroup writeback. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
67688c08 |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
fs/gfs2: Use the enum req_op and blk_opf_t types Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for variables that represent request flags. Combine the first two gfs2_submit_bhs() arguments into a single argument. Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-54-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
dc7674ed |
|
29-Jun-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: tiny cleanup in gfs2_log_reserve Function gfs2_log_reserve was setting revoke_blks to 0. There's no need because it calculates it shortly thereafter. This patch removes the unnecessary set. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
f5456b5d |
|
19-May-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Clean up revokes on normal withdraws Before this patch, the system ail lists were cleaned up if the logd process withdrew, but on other withdraws, they were not cleaned up. This included the cleaning up of the revokes as well. This patch reorganizes things a bit so that all withdraws (not just logd) clean up the ail lists, including any pending revokes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c551f66c |
|
30-Mar-2021 |
Lee Jones <lee.jones@linaro.org> |
gfs2: Fix a number of kernel-doc warnings Building the kernel with W=1 results in a number of kernel-doc warnings like incorrect function names and parameter descriptions. Fix those, mostly by adding missing parameter descriptions, removing left-over descriptions, and demoting some less important kernel-doc comments into regular comments. Originally proposed by Lee Jones; improved and combined into a single patch by Andreas. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
152f58c9 |
|
27-Mar-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extent We don't need two very similar functions for mapping logical blocks to physical blocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4f0f586b |
|
08-Apr-2021 |
Sami Tolvanen <samitolvanen@google.com> |
treewide: Change list_sort to use const pointers list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
|
#
0efc4976 |
|
12-Mar-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: bypass log flush if the journal is not live Patch fe3e397668775 ("gfs2: Rework the log space allocation logic") changed gfs2_log_flush to reserve a set of journal blocks in case no transaction is active. However, gfs2_log_flush also gets called in cases where we don't have an active journal, for example, for spectator mounts. In that case, trying to reserve blocks would sleep forever, but we want gfs2_log_flush to be a no-op instead. Fixes: fe3e397668775 ("gfs2: Rework the log space allocation logic") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
1a5a2cfd |
|
25-Feb-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix use-after-free in trans_drain This patch adds code to function trans_drain to remove drained bd elements from the ail lists, if queued, before freeing the bd. If we don't remove the bd from the ail, function ail_drain will try to reference the bd after it has been freed by trans_drain. Thanks to Andy Price for his analysis of the problem. Reported-by: Andy Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
17d77684 |
|
18-Feb-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush In gfs2_ail1_flush, we're using I/O plugging to give the block layer a better chance of merging I/O requests. If we're too aggressive here, we can end up waiting on I/O to complete while still plugged. Fix that in a way similar to writeback_sb_inodes, except that we can't use blk_flush_plug because blk_flush_plug_list is not exported. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
2129b428 |
|
17-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Per-revoke accounting in transactions In the log, revokes are stored as a revoke descriptor (struct gfs2_log_descriptor), followed by zero or more additional revoke blocks (struct gfs2_meta_header). On filesystems with a blocksize of 4k, the revoke descriptor contains up to 503 revokes, and the metadata blocks contain up to 509 revokes each. We've so far been reserving space for revokes in transactions in block granularity, so a lot more space than necessary was being allocated and then released again. This patch switches to assigning revokes to transactions individually instead. Initially, space for the revoke descriptor is reserved and handed out to transactions. When more revokes than that are reserved, additional revoke blocks are added. When the log is flushed, the space for the additional revoke blocks is released, but we keep the space for the revoke descriptor block allocated. Transactions may still reserve more revokes than they will actually need in the end, but now we won't overshoot the target as much, and by only returning the space for excess revokes at log flush time, we further reduce the amount of contention between processes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
fe3e3976 |
|
09-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rework the log space allocation logic The current log space allocation logic is hard to understand or extend. The principle it that when the log is flushed, we may or may not have a transaction active that has space allocated in the log. To deal with that, we set aside a magical number of blocks to be used in case we don't have an active transaction. It isn't clear that the pool will always be big enough. In addition, we can't return unused log space at the end of a transaction, so the number of blocks allocated must exactly match the number of blocks used. Simplify this as follows: * When transactions are allocated or merged, always reserve enough blocks to flush the transaction (err on the safe side). * In gfs2_log_flush, return any allocated blocks that haven't been used. * Maintain a pool of spare blocks big enough to do one log flush, as before. * In gfs2_log_flush, when we have no active transaction, allocate a suitable number of blocks. For that, use the spare pool when called from logd, and leave the pool alone otherwise. This means that when the log is almost full, logd will still be able to do one more log flush, which will result in more log space becoming available. This will make the log space allocator code easier to work with in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
71b219f4 |
|
29-Jan-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Minor calc_reserved cleanup No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
76fce654 |
|
19-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Move function gfs2_ail_empty_tr Move this function further up in log.c so that we can use it in the next patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5cb738b5 |
|
18-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Get rid of current_tail() Keep the current value of the updated log tail in the super block as sb_log_flush_tail instead of computing it on the fly. This avoids unnecessary sd_ail_lock taking and cleans up the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5ae8fff8 |
|
13-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Clean up gfs2_log_reserve Wake up log waiters in gfs2_log_release when log space has actually become available. This is a much better place for the wakeup than gfs2_logd. Check if enough log space is immeditely available before anything else. If there isn't, use io_wait_event to wait instead of open-coding it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4a3d049d |
|
10-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Don't wait for journal flush in clean_journal Commit 588bff95c94e added gfs2_write_log_header() and started using it in clean_journal(), with an additional call to log_flush_wait() at the end of gfs2_write_log_header() which is unnecessary for clean_journal(). Move that call out of gfs2_write_log_header() to restore the previous behavior. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c1eba1b0 |
|
12-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Move lock flush locking to gfs2_trans_{begin,end} Move the read locking of sd_log_flush_lock from gfs2_log_reserve to gfs2_trans_begin, and its unlocking from gfs2_log_release to gfs2_trans_end. Use gfs2_log_release in two places in which it was open coded before. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f3708fb5 |
|
13-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Get rid of sd_reserving_log This counter and the associated wait queue are only used so that gfs2_make_fs_ro can efficiently wait for all pending log space allocations to fail after setting the filesystem to read-only. This comes at the cost of waking up that wait queue very frequently. Instead, when gfs2_log_reserve fails because the filesystem has become read-only, Wake up sd_log_waitq. In gfs2_make_fs_ro, set the file system read-only and then wait until all the log space has been released. Give up and report the problem after a while. With that, sd_reserving_log and sd_reserving_log_wait can be removed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c968f578 |
|
29-Jan-2021 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Clean up on-stack transactions Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only needs to be set in the exceptional case of on-stack transactions. Split off __gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded version in gfs2_ail_empty_gl. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
82218943 |
|
21-Jan-2021 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: keep bios separate for each journal The recovery func can recover multiple journals, but they were all using the same bio. This resulted in use-after-free related to sdp->sd_log_bio. This patch moves the variable to the journal descriptor, jd, so that every recovery can operate on its own bio. And hopefully we never run out. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
4a011849 |
|
20-Jan-2021 |
Bob Peterson <rpeterso@redhat.com> |
Revert "GFS2: Re-add a call to log_flush_wait when flushing the journal" This reverts commit 428fd95d859b24fea448380fa21ad6d841b34241. Patch 428fd95d85b2 added a call to log_flush_wait to function gfs2_log_flush. Then gfs2_log_flush calls log_write_header which submits a write request with the REQ_PREFLUSH flag which also forces it to wait. This patch removes the unnecessary call to log_flush_wait. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
6e80674a |
|
11-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Clean up ail2_empty Clean up the logic in ail2_empty (no functional change). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
e7501bf8 |
|
18-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rename gfs2_{write => flush}_revokes Function gfs2_write_revokes doesn't actually write any revokes; instead, it adds revokes to the system transaction during a flush. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
6188e877 |
|
06-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Some documentation updates The calc_reserved description claims that buf_limit is 502 (on 4k filesystems), but it is actually 503. Fix / clarify the entire description. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5a4e9c60 |
|
06-Dec-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Minor gfs2_write_revokes cleanups Clean up the computations in gfs2_write_revokes (no change in functionality). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
4e79e3f0 |
|
12-Nov-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix case in which ail writes are done to jdata holes Patch b2a846dbef4e ("gfs2: Ignore journal log writes for jdata holes") tried (unsuccessfully) to fix a case in which writes were done to jdata blocks, the blocks are sent to the ail list, then a punch_hole or truncate operation caused the blocks to be freed. In other words, the ail items are for jdata holes. Before b2a846dbef4e, the jdata hole caused function gfs2_block_map to return -EIO, which was eventually interpreted as an IO error to the journal, and then withdraw. This patch changes function gfs2_get_block_noalloc, which is only used for jdata writes, so it returns -ENODATA rather than -EIO, and when -ENODATA is returned to gfs2_ail1_start_one, the error is ignored. We can safely ignore it because gfs2_ail1_start_one is only called when the jdata pages have already been written and truncated, so the ail1 content no longer applies. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
36c78309 |
|
18-Aug-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: make gfs2_ail1_empty_one return the count of active items This patch is one baby step toward simplifying the journal management. It simply changes function gfs2_ail1_empty_one from a void to an int and makes it return a count of active items. This allows the caller to check the return code rather than list_empty on the tr_ail1_list. This way we can, in a later patch, combine transaction ail1 and ail2 lists. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
68942870 |
|
22-Jul-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe Before this patch, when blocks were freed, it called gfs2_meta_wipe to take the metadata out of the pending journal blocks. It did this mostly by calling another function called gfs2_remove_from_journal. This is shortsighted because it does not do anything with jdata blocks which may also be in the journal. This patch expands the function so that it wipes out jdata blocks from the journal as well, and it wipes it from the ail1 list if it hasn't been written back yet. Since it now processes jdata blocks as well, the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe. New function gfs2_ail1_wipe wants a static view of the ail list, so it locks the sd_ail_lock when removing items. To accomplish this, function gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now the caller's responsibility to do so. I was going to make sd_ail_lock locking conditional, but the practice is generally frowned upon. For details, see: https://lwn.net/Articles/109066/ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
77650bdb |
|
20-Aug-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: add missing log_blocks trace points in gfs2_write_revokes Function gfs2_write_revokes was incrementing and decrementing the number of log blocks free, but there was never a log_blocks trace point for it. Thus, the free blocks from a log_blocks trace would jump around mysteriously. This patch adds the missing trace points so the trace makes more sense. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
521031fa |
|
31-Aug-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix bad comment for trans_drain Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5a61ae14 |
|
28-Aug-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Make sure we don't miss any delayed withdraws Commit ca399c96e96e changes gfs2_log_flush to not withdraw the filesystem while holding the log flush lock, but it fails to check if the filesystem needs to be withdrawn once the log flush lock has been released. Likewise, commit f05b86db314d depends on gfs2_log_flush to trigger for delayed withdraws. Add that and clean up the code flow somewhat. In gfs2_put_super, add a check for delayed withdraws that have been missed to prevent these kinds of bugs in the future. Fixes: ca399c96e96e ("gfs2: flesh out delayed withdraw for gfs2_log_flush") Fixes: f05b86db314d ("gfs2: Prepare to withdraw as soon as an IO error occurs in log write") Cc: stable@vger.kernel.org # v5.7+: 462582b99b607: gfs2: add some much needed cleanup for log flushes that fail Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
462582b9 |
|
21-Aug-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: add some much needed cleanup for log flushes that fail When a log flush fails due to io errors, it signals the failure but does not clean up after itself very well. This is because buffers are added to the transaction tr_buf and tr_databuf queue, but the io error causes gfs2_log_flush to bypass the "after_commit" functions responsible for dequeueing the bd elements. If the bd elements are added to the ail list before the error, function ail_drain takes care of dequeueing them. But if they haven't gotten that far, the elements are forgotten and make the transactions unable to be freed. This patch introduces new function trans_drain which drains the bd elements from the transaction so they can be freed properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b57bc0fb |
|
22-Jul-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix inaccurate comment The comment regarding journal flush thresholds is wrong. This patch fixes it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
7542486b |
|
17-Jun-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: eliminate GIF_ORDERED in favor of list_empty In several places, we used the GIF_ORDERED inode flag to determine if an inode was on the ordered writes list. However, since we always held the sd_ordered_lock spin_lock during the manipulation, we can just as easily check list_empty(&ip->i_ordered) instead. This allows us to keep more than one ordered writes list to make journal writing improvements. This patch eliminates GIF_ORDERED in favor of checking list_empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
58e08e8d |
|
09-Jun-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix trans slab error when withdraw occurs inside log_flush Log flush operations (gfs2_log_flush()) can target a specific transaction. But if the function encounters errors (e.g. io errors) and withdraws, the transaction was only freed it if was queued to one of the ail lists. If the withdraw occurred before the transaction was queued to the ail1 list, function ail_drain never freed it. The result was: BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown() This patch makes log_flush() add the targeted transaction to the ail1 list so that function ail_drain() will find and free it properly. Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
83d060ca |
|
04-Jun-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix use-after-free on transaction ail lists Before this patch, transactions could be merged into the system transaction by function gfs2_merge_trans(), but the transaction ail lists were never merged. Because the ail flushing mechanism can run separately, bd elements can be attached to the transaction's buffer list during the transaction (trans_add_meta, etc) but quickly moved to its ail lists. Later, in function gfs2_trans_end, the transaction can be freed (by gfs2_trans_end) while it still has bd elements queued to its ail lists, which can cause it to either lose track of the bd elements altogether (memory leak) or worse, reference the bd elements after the parent transaction has been freed. Although I've not seen any serious consequences, the problem becomes apparent with the previous patch's addition of: gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list)); to function gfs2_trans_free(). This patch adds logic into gfs2_merge_trans() to move the merged transaction's ail lists to the sdp transaction. This prevents the use-after-free. To do this properly, we need to hold the ail lock, so we pass sdp into the function instead of the transaction itself. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b839dada |
|
17-Apr-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: new slab for transactions This patch adds a new slab for gfs2 transactions. That allows us to reduce kernel memory fragmentation, have better organization of data for analysis of vmcore dumps. A new centralized function is added to free the slab objects, and it exposes use-after-free by giving warnings if a transaction is freed while it still has bd elements attached to its buffers or ail lists. We make sure to initialize those transaction ail lists so we can check their integrity when freeing. At a later time, we should add a slab initialization function to make it more efficient, but for this initial patch I wanted to minimize the impact. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
cbcc89b6 |
|
05-Jun-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: initialize transaction tr_ailX_lists earlier Since transactions may be freed shortly after they're created, before a log_flush occurs, we need to initialize their ail1 and ail2 lists earlier. Before this patch, the ail1 list was initialized in gfs2_log_flush(). This moves the initialization to the point when the transaction is first created. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
d5dc3d96 |
|
22-May-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: instrumentation wrt log_flush stuck This adds checks for gfs2_log_flush being stuck, similarly to the check in gfs2_ail1_flush. To faciliate this and make the strings easy to grep we move the ail1 emptying to its own function, empty_ail1_list. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f4e2f5e1 |
|
05-May-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Grab glock reference sooner in gfs2_add_revoke This patch rearranges gfs2_add_revoke so that the extra glock reference is added earlier on in the function to avoid races in which the glock is freed before the new reference is taken. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
d22f69a0 |
|
23-Apr-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix use-after-free in gfs2_logd after withdraw When the gfs2_logd daemon withdrew, the withdraw sequence called into make_fs_ro() to make the file system read-only. That caused the journal descriptors to be freed. However, those journal descriptors were used by gfs2_logd's call to gfs2_ail_flush_reqd(). This caused a use-after free and NULL pointer dereference. This patch changes function gfs2_logd() so that it stops all logd work until the thread is told to stop. Once a withdraw is done, it only does an interruptible sleep. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
75b46c43 |
|
27-Mar-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Fix oversight in gfs2_ail1_flush Ordinarily, function gfs2_ail1_start_one issues a write request for one item on the ail1 list, then returns -EBUSY. This makes the caller, gfs2_ail1_flush, loop around and start another. However, it was not clearing the -EBUSY return code each time through the loop. So on rare occasions, like when the wbc runs out of nr_to_write, it remained set to -EBUSY, which triggered an error and withdraw. This patch sets the return code to 0 each time through the restart loop so this won't happen anymore. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
9592ea80 |
|
25-Mar-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: instrumentation wrt ail1 stuck Before this patch, if the ail1 flush got stuck for some reason, there were no clues as to why. This patch introduces a check for getting stuck for more than a minute, and if it happens, it dumps the items still remaining on the ail1 list. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
969183bc |
|
03-Feb-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Switch to list_{first,last}_entry Replace open-coded versions of list_first_entry and list_last_entry with those functions. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
49003128 |
|
06-Mar-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Additional information when gfs2_ail1_flush withdraws Before this patch, if gfs2_ail1_flush gets an error from function gfs2_ail1_start_one (which comes indirectly from generic_writepages) the file system is withdrawn, but without any explanation why. This patch adds an error message if gfs2_ail1_flush gets an error from gfs2_ail1_start_one. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
ca399c96 |
|
08-Jan-2020 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: flesh out delayed withdraw for gfs2_log_flush Function gfs2_log_flush() had a few places where it tried to withdraw from the file system when errors were encountered. The problem is, it should delay those withdraws until the log flush lock is no longer held. This patch creates a new function just for delayed withdraws for situations like this. If errors=panic was specified on mount, we still want to do it the old fashioned way because the panic it does not help to delay in that situation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
2ca0c2fb |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: drain the ail2 list after io errors Before this patch, gfs2_logd continually tried to flush its journal log, after the file system is withdrawn. We don't want to write anything to the journal, lest we add corruption. Best course of action is to drain the ail1 into the ail2 list (via gfs2_ail1_empty) then drain the ail2 list with a new function, ail2_drain. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
b1676cbb |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Withdraw in gfs2_ail1_flush if write_cache_pages fails Before this patch, function gfs2_ail1_start_one would return any errors it received from write_cache_pages (except -EBUSY) but it did not withdraw. Since function gfs2_ail1_flush just checks for the bad return code and loops, the loop might potentially never end. This patch adds some logic to allow it to exit the loop and withdraw properly when errors are received from write_cache_pages. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
9ff78289 |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Before this patch, if gfs2_ail_empty_gl saw there was nothing on the ail list, it would return and not flush the log. The problem is that there could still be a revoke for the rgrp sitting on the sd_log_le_revoke list that's been recently taken off the ail list. But that revoke still needs to be written, and the rgrp_go_inval still needs to call log_flush_wait to ensure the revokes are all properly written to the journal before we relinquish control of the glock to another node. If we give the glock to another node before we have this knowledge, the node might crash and its journal replayed, in which case the missing revoke would allow the journal replay to replay the rgrp over top of the rgrp we already gave to another node, thus overwriting its changes and corrupting the file system. This patch makes gfs2_ail_empty_gl still call gfs2_log_flush rather than returning. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5e4c7632 |
|
21-Feb-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Issue revokes more intelligently Before this patch, function gfs2_write_revokes would call gfs2_ail1_empty, then traverse the sd_ail1_list looking for transactions that had bds which were no longer queued to a glock. And if it found some, it would try to issue revokes for them, up to a predetermined maximum. There were two problems with how it did this. First was the fact that gfs2_ail1_empty moves transactions which have nothing remaining on the ail1 list from the sd_ail1_list to the sd_ail2_list, thus making its traversal of sd_ail1_list miss them completely, and therefore, never issue revokes for them. Second was the fact that there were three traversals (or partial traversals) of the sd_ail1_list, each of which took and then released the sd_ail_lock lock: First inside gfs2_ail1_empty, second to determine if there are any revokes to be issued, and third to actually issue them. All this taking and releasing of the sd_ail_lock meant other processes could modify the lists and the conditions in which we're working. This patch simplies the whole process by adding a new parameter to function gfs2_ail1_empty, max_revokes. For normal calls, this is passed in as 0, meaning we don't want to issue any revokes. For function gfs2_write_revokes, we pass in the maximum number of revokes we can, thus allowing gfs2_ail1_empty to add the revokes where needed. This simplies the code, allows for a single holding of the sd_ail_lock, and allows gfs2_ail1_empty to add revokes for all the necessary bd items without missing any. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
036330c9 |
|
10-Apr-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: log error reform Before this patch, gfs2 kept track of journal io errors in two places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags. This patch consolidates the two into sd_log_error so that it reflects the first error encountered writing to the journal. In future patches, we will take advantage of this by checking this value rather than having to check both when reacting to io errors. In addition, this fixes a tight loop in unmount: If buffers get on the ail1 list and an io error occurs elsewhere, the ail1 list would never be cleared because they were always busy. So unmount would hang, waiting for the ail1 list to empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
30fe70a8 |
|
13-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: clear ail1 list when gfs2 withdraws This patch fixes a bug in which function gfs2_log_flush can get into an infinite loop when a gfs2 file system is withdrawn. The problem is the infinite loop "for (;;)" in gfs2_log_flush which would never finish because the io error and subsequent withdraw prevented the items from being taken off the ail list. This patch tries to clean up the mess by allowing withdraw situations to move not-in-flight buffer_heads to the ail2 list, where they will be dealt with later. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
69511080 |
|
12-Feb-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Introduce concept of a pending withdraw File system withdraws can be delayed when inconsistencies are discovered when we cannot withdraw immediately, for example, when critical spin_locks are held. But delaying the withdraw can cause gfs2 to ignore the error and keep running for a short period of time. For example, an rgrp glock may be dequeued and demoted while there are still buffers that haven't been properly revoked, due to io errors writing to the journal. This patch introduces a new concept of a pending withdraw, which means an inconsistency has been discovered and we need to withdraw at the earliest possible opportunity. In these cases, we aren't quite withdrawn yet, but we still need to not dequeue glocks and other critical things. If we dequeue the glocks and the withdraw results in our journal being replayed, the replay could overwrite data that's been modified by a different node that acquired the glock in the meantime. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
badb55ec |
|
23-Jan-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Split gfs2_lm_withdraw into two functions Split gfs2_lm_withdraw into a function that prints an error message and a function that withdraws the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
a31b4ec5 |
|
20-Jan-2020 |
Bob Peterson <rpeterso@redhat.com> |
Revert "gfs2: eliminate tr_num_revoke_rm" This reverts commit e955537e3262de8e56f070b13817f525f472fa00. Before patch e955537e32, tr_num_revoke tracked the number of revokes added to the transaction, and tr_num_revoke_rm tracked how many revokes were removed. But since revokes are queued off the sdp (superblock) pointer, some transactions could remove more revokes than they added. (e.g. revokes added by a different process). Commit e955537e32 eliminated transaction variable tr_num_revoke_rm, but in order to do so, it changed the accounting to always use tr_num_revoke for its math. Since you can remove more revokes than you add, tr_num_revoke could now become a negative value. This negative value broke the assert in function gfs2_trans_end: if (gfs2_assert_withdraw(sdp, (nbuf <=3D tr->tr_blocks) && (tr->tr_num_revoke <=3D tr->tr_revokes))) One way to fix this is to simply remove the tr_num_revoke clause from the assert and allow the value to become negative. Andreas didn't like that idea, so instead, we decided to revert e955537e32. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
5d439758 |
|
09-Jan-2020 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix incorrect variable name Rename sd_log_commited_revoke to sd_log_committed_revoke. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
2e9eeaa1 |
|
13-Dec-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: eliminate ssize parameter from gfs2_struct2blk Every caller of function gfs2_struct2blk specified sizeof(u64). This patch eliminates the unnecessary parameter and replaces the size calculation with a new superblock variable that is computed to be the maximum number of block pointers we can fit inside a log descriptor, as is done for pointers per dinode and indirect block. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
ade48088 |
|
20-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Don't write log headers after file system withdraw Before this patch, when a node withdrew a gfs2 file system, it wrote a (clean) unmount log header. That's wrong. You don't want to write anything to the journal once you're withdrawn because that's acknowledging that the transaction is complete and the journal is in good shape, neither of which may be a valid assumption when the file system is withdrawn. This is especially true if the withdraw was caused due to io errors writing to the journal in the first place. The best course of action is to leave the journal "as is" until it may be safely replayed during journal recovery, regardless of whether it's done by this node or another. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
f155f5e0 |
|
14-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix infinite loop in gfs2_ail1_flush on io error Before this patch, an IO error encountered in function gfs2_ail1_flush would cause a deadlock: because of the io error (and its resulting withdrawn state), buffers stopped being written to the journal. Buffers would remain on the ail1 list, so gfs2_ail1_start_one would return 1 to indicate dirty buffers were still on the ail1 list. However, when function gfs2_ail1_flush got a non-zero return code, it would goto restart to retry the writes, which meant it would never finish, and thus the infinite loop. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
fe5e7ba1 |
|
14-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: fix glock reference problem in gfs2_trans_remove_revoke Commit 9287c6452d2b fixed a situation in which gfs2 could use a glock after it had been freed. To do that, it temporarily added a new glock reference by calling gfs2_glock_hold in function gfs2_add_revoke. However, if the bd element was removed by gfs2_trans_remove_revoke, it failed to drop the additional reference. This patch adds logic to gfs2_trans_remove_revoke to properly drop the additional glock reference. Fixes: 9287c6452d2b ("gfs2: Fix occasional glock use-after-free") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
feed98a8 |
|
14-Nov-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: make gfs2_log_shutdown static Function gfs2_log_shutdown is only called from within log.c. This patch removes the extern declaration and makes it static. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
19ebc050 |
|
28-Aug-2019 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Remove active journal side effect from gfs2_write_log_header Function gfs2_write_log_header can be used to write a log header into any of the journals of a filesystem. When used on the node's own journal, gfs2_write_log_header advances the current position in the log (sdp->sd_log_flush_head) as a side effect, through function gfs2_log_bmap. This is confusing, and it also means that we can't use gfs2_log_bmap for other journals even if they have an extent map. So clean this mess up by not advancing sdp->sd_log_flush_head in gfs2_write_log_header or gfs2_log_bmap anymore and making that a responsibility of the callers instead. This is related to commit 7c70b896951c ("gfs2: clean_journal improperly set sd_log_flush_head"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
e955537e |
|
26-Mar-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: eliminate tr_num_revoke_rm For its journal processing, gfs2 kept track of the number of buffers added and removed on a per-transaction basis. These values are used to calculate space needed in the journal. But while these calculations make sense for the number of buffers, they make no sense for revokes. Revokes are managed in their own list, linked from the superblock. So it's entirely unnecessary to keep separate per-transaction counts for revokes added and removed. A single count will do the same job. Therefore, this patch combines the transaction revokes into a single count. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
638803d4 |
|
06-Jun-2019 |
Bob Peterson <rpeterso@redhat.com> |
Revert "gfs2: Replace gl_revokes with a GLF flag" Commit 73118ca8baf7 introduced a glock reference counting bug in gfs2_trans_remove_revoke. Given that, replacing gl_revokes with a GLF flag is no longer useful, so revert that commit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
7336d0e6 |
|
31-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398 Based on 1 normalized pattern(s): this copyrighted material is made available to anyone wishing to use modify copy or redistribute it subject to the terms and conditions of the gnu general public license version 2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 44 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
f4686c26 |
|
02-May-2019 |
Abhi Das <adas@redhat.com> |
gfs2: read journal in large chunks Use bios to read in the journal into the address space of the journal inode (jd_inode), sequentially and in large chunks. This is faster for locating the journal head that the previous binary search approach. When performing recovery, we keep the journal in the address space until recovery is done, which further speeds up things. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
a5b1d3fc |
|
04-Apr-2019 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Rename sd_log_le_{revoke,ordered} Rename sd_log_le_revoke to sd_log_revokes and sd_log_le_ordered to sd_log_ordered: not sure what le stands for here, but it doesn't add clarity, and if it stands for list entry, it's actually confusing as those are both list heads but not list entries. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
73118ca8 |
|
04-Apr-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Replace gl_revokes with a GLF flag The gl_revokes value determines how many outstanding revokes a glock has on the superblock revokes list; this is used to avoid unnecessary log flushes. However, gl_revokes is only ever tested for being zero, and it's only decremented in revoke_lo_after_commit, which removes all revokes from the list, so we know that the gl_revoke values of all the glocks on the list will reach zero. Therefore, we can replace gl_revokes with a bit flag. This saves an atomic counter in struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
9287c645 |
|
04-Apr-2019 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Fix occasional glock use-after-free This patch has to do with the life cycle of glocks and buffers. When gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata object is assigned to track the buffer, and that is queued to various lists, including the glock's gl_ail_list to indicate it's on the active items list. Once the page associated with the buffer has been written, it is removed from the ail list, but its life isn't over until a revoke has been successfully written. So after the block is written, its bufdata object is moved from the glock's gl_ail_list to a file-system-wide list of pending revokes, sd_log_le_revoke. At that point the glock still needs to track how many revokes it contributed to that list (in gl_revokes) so that things like glock go_sync can ensure all the metadata has been not only written, but also revoked before the glock is granted to a different node. This is to guarantee journal replay doesn't replay the block once the glock has been granted to another node. Ross Lagerwall recently discovered a race in which an inode could be evicted, and its glock freed after its ail list had been synced, but while it still had unwritten revokes on the sd_log_le_revoke list. The evict decremented the glock reference count to zero, which allowed the glock to be freed. After the revoke was written, function revoke_lo_after_commit tried to adjust the glock's gl_revokes counter and clear its GLF_LFLUSH flag, at which time it referenced the freed glock. This patch fixes the problem by incrementing the glock reference count in gfs2_add_revoke when the glock's first bufdata object is moved from the glock to the global revokes list. Later, when the glock's last such bufdata object is freed, the reference count is decremented. This guarantees that whichever process finishes last (the revoke writing or the evict) will properly free the glock, and neither will reference the glock after it has been freed. Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
7c70b896 |
|
25-Mar-2019 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: clean_journal improperly set sd_log_flush_head This patch fixes regressions in 588bff95c94efc05f9e1a0b19015c9408ed7c0ef. Due to that patch, function clean_journal was setting the value of sd_log_flush_head, but that's only valid if it is replaying the node's own journal. If it's replaying another node's journal, that's completely wrong and will lead to multiple problems. This patch tries to clean up the mess by passing the value of the logical journal block number into gfs2_write_log_header so the function can treat non-owned journals generically. For the local journal, the journal extent map is used for best performance. For other nodes from other journals, new function gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get. This patch also tries to establish more consistency when passing journal block parameters by changing several unsigned int types to a consistent u32. Fixes: 588bff95c94e ("GFS2: Reduce code redundancy writing log headers") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
23e93c9b |
|
13-Feb-2019 |
Bob Peterson <rpeterso@redhat.com> |
Revert "gfs2: read journal in large chunks to locate the head" This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cbbe76c8 |
|
16-Nov-2018 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: Remove vestigial bd_ops Field bd_ops was set but never used, so I removed it, and all code supporting it. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
2a5f14f2 |
|
09-Nov-2018 |
Abhi Das <adas@redhat.com> |
gfs2: read journal in large chunks to locate the head Use bio(s) to read in the journal sequentially in large chunks and locate the head of the journal. This version addresses the issues Christoph pointed out w.r.t error handling and using deprecated API. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
|
#
5b846095 |
|
09-Nov-2018 |
Abhi Das <adas@redhat.com> |
gfs2: changes to gfs2_log_XXX_bio Change gfs2_log_XXX_bio family of functions so they can be used with different bios, not just sdp->sd_log_bio. This patch also contains some clean up suggested by Andreas. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
c9e58fb2 |
|
14-Oct-2018 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: write revokes should traverse sd_ail1_list in reverse All the other functions that deal with the sd_ail_list run the list from the tail back to the head, iow, in reverse. We should do the same while writing revokes, otherwise we might miss removing entries properly from the list when we hit the limit of how many revokes we can write at one time (based on block size, which determines how many block pointers will fit in the revoke block). Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
b524abcc |
|
04-Oct-2018 |
Bob Peterson <rpeterso@redhat.com> |
gfs2: slow the deluge of io error messages When an io error is hit, it calls gfs2_io_error_bh_i for every journal buffer it can't write. Since we changed gfs2_io_error_bh_i recently to withdraw later in the cycle, it sends a flood of errors to the console. This patch checks for the file system already being withdrawn, and if so, doesn't send more messages. It doesn't stop the flood of messages, but it slows it down and keeps it more reasonable. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
ee9c7f9a |
|
20-Jun-2018 |
Arnd Bergmann <arnd@arndb.de> |
gfs2: call ktime_get_coarse_real_ts64() directly current_kernel_time64() is now just a deprecated wrapper around ktime_get_coarse_real_ts64(), so let's just call that directly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
9e1a9ecd |
|
07-Jun-2018 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Don't withdraw under a spin lock In two places, the gfs2_io_error_bh macro is called while holding the sd_ail_lock spin lock. This isn't allowed because gfs2_io_error_bh withdraws the filesystem, which can sleep because it issues a uevent. To fix that, add a gfs2_io_error_bh_wd macro that does withdraw the filesystem and change gfs2_io_error_bh to not withdraw the filesystem. In those places where the new gfs2_io_error_bh is used, withdraw the filesystem after releasing sd_ail_lock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
|
#
9bc980cd |
|
02-Mar-2018 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Make function gfs2_remove_from_ail static Function gfs2_remove_from_ail is only ever used from log.c, so there is no reason to declare it extern. This patch removes the extern and declares it static. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
805c0907 |
|
08-Jan-2018 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Log the reason for log flushes in every log header This patch just adds the capability for GFS2 to track which function called gfs2_log_flush. This should make it easier to diagnose problems based on the sequence of events found in the journals. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
c1696fb8 |
|
16-Jan-2018 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Introduce new gfs2_log_header_v2 This patch adds a new structure called gfs2_log_header_v2 which is used to store expanded fields into previously unused areas of the log headers (i.e., this change is backwards compatible). Some of these are used for debug purposes so we can backtrack when problems occur. Others are reserved for future expansion. This patch is based on a prototype from Steve Whitehouse. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
#
0ff5916a |
|
16-Jan-2018 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Get rid of gfs2_log_header_in Get rid of gfs2_log_header_in by integrating it into get_log_header. Clean up the crc32 computations and use the same functions for encoding and decoding to make things less confusing. Eliminate lh_hash from gfs2_log_header_host which is completely useless. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
1f23bc78 |
|
22-Dec-2017 |
Abhi Das <adas@redhat.com> |
gfs2: Trim the ordered write list in gfs2_ordered_write() We iterate through the entire ordered writes list in gfs2_ordered_write() to write out inodes. It's a good place to try and shrink the list by throwing out inodes that don't have any pages. Signed-off-by: Abhi Das <adas@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
588bff95 |
|
17-Dec-2017 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Reduce code redundancy writing log headers Before this patch, there was a lot of code redundancy between functions log_write_header (which uses bio) and clean_journal (which uses buffer_head). This patch reduces the redundancy to simplify the code and make log header writing more consistent. We want more consistency and reduced redundancy because we plan to add a bunch of new fields to improve performance (by eliminating the local statfs and quota files) improve metadata integrity (by adding new crcs and such) and for better debugging (by adding new fields to track when and where metadata was pushed through the journals.) We don't want to duplicate setting these new fields, nor allow for human error in the process. This reduction in code redundancy is accomplished by introducing a new helper function, gfs2_write_log_header which uses bio rather than bh. That simplifies recovery function clean_journal() to use the new helper function and iomap rather than redundancy and block_map (and eventually we can maybe remove block_map). It also reduces our dependency on buffer_heads. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
942b0cdd |
|
16-Aug-2017 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Withdraw for IO errors writing to the journal or statfs Before this patch, if GFS2 encountered IO errors while writing to the journal, it would not report the problem, so they would go unnoticed, sometimes for many hours. Sometimes this would only be noticed later, when recovery tried to do journal replay and failed due to invalid metadata at the blocks that resulted in IO errors. This patch makes GFS2's log daemon check for IO errors. If it encounters one, it withdraws from the file system and reports why in dmesg. A similar action is taken when IO errors occur when writing to the system statfs file. These errors are also reported back to any callers of fsync, since that requires the journal to be flushed. Therefore, any IO errors that would previously go unnoticed are now noticed and the file system is withdrawn as early as possible, thus preventing further file system damage. Also note that this reintroduces superblock variable sd_log_error, which Christoph removed with commit f729b66fca. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
b066a4eeb |
|
03-Aug-2017 |
Abhi Das <adas@redhat.com> |
gfs2: forcibly flush ail to relieve memory pressure On systems with low memory, it is possible for gfs2 to infinitely loop in balance_dirty_pages() under heavy IO (creating sparse files). balance_dirty_pages() attempts to write out the dirty pages via gfs2_writepages() but none are found because these dirty pages are being used by the journaling code in the ail. Normally, the journal has an upper threshold which when hit triggers an automatic flush of the ail. But this threshold can be higher than the number of allowable dirty pages and result in the ail never being flushed. This patch forces an ail flush when gfs2_writepages() fails to write anything. This is a good indication that the ail might be holding some dirty pages. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
722f6f62 |
|
20-Jun-2017 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Eliminate vestigial sd_log_flush_wrapped Superblock variable sd_log_flush_wrapped is set, but never referenced, so this patch eliminates it. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
0f0b9b63 |
|
02-May-2017 |
Jan Kara <jack@suse.cz> |
gfs2: Make flush bios explicitely sync Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: Steven Whitehouse <swhiteho@redhat.com> CC: cluster-devel@redhat.com CC: stable@vger.kernel.org Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
9862ca05 |
|
24-Jan-2017 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Switch tr_touched to flag in transaction This patch eliminates the int variable tr_touched in favor of a new flag in the transaction. This is a step toward reducing contention on the gfs2_log_lock spin_lock. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
b63f5e84 |
|
06-Jan-2017 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Wake up io waiters whenever a flush is done Before this patch, if a process called function gfs2_log_reserve to reserve some journal blocks, but the journal not enough blocks were free, it would call io_schedule. However, in the log flush daemon, it woke up the waiters only if an gfs2_ail_flush was no longer required. This resulted in situations where processes would wait forever because the number of blocks required was so high that it pushed the journal into a perpetual state of flush being required. This patch changes the logd daemon so that it wakes up io waiters every time the log is actually flushed. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
f07b3520 |
|
05-Jan-2017 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Made logd daemon take into account log demand Before this patch, the logd daemon only tried to flush things when the log blocks pinned exceeded a certain threshold. But when we're deleting very large files, it may require a huge number of journal blocks, and that, in turn, may exceed the threshold. This patch factors that into account. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
70fd7614 |
|
01-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
e1b1afa6 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
gfs2: use bio op accessors Separate the op from the rq_flag_bits and have gfs2 set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
400ac52e |
|
09-Dec-2015 |
Benjamin Marzinski <bmarzins@redhat.com> |
gfs2: clear journal live bit in gfs2_log_flush When gfs2 was unmounting filesystems or changing them to read-only it was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This caused a race. If an inode glock got demoted in the gap between clearing the bit and the shutdown flush, it would be unable to reserve log space to clear out the active items list in inode_go_sync, causing an error in inode_go_inval because the glock was still dirty. To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the shutdown log flush. This means that, because of the locking on the log blocks, either inode_go_sync will be able to reserve space to clean the glock before the shutdown flush, or the shutdown flush will clean the glock itself, before inode_go_sync fails to reserve the space. Either way, the glock will be clean before inode_go_inval. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
2e60d768 |
|
13-Nov-2014 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: update freeze code to use freeze/thaw_super on all nodes The current gfs2 freezing code is considerably more complicated than it should be because it doesn't use the vfs freezing code on any node except the one that begins the freeze. This is because it needs to acquire a cluster glock before calling the vfs code to prevent a deadlock, and without the new freeze_super and thaw_super hooks, that was impossible. To deal with the issue, gfs2 had to do some hacky locking tricks to make sure that a frozen node couldn't be holding on a lock it needed to do the unfreeze ioctl. This patch makes use of the new hooks to simply the gfs2 locking code. Now, all the nodes in the cluster freeze and thaw in exactly the same way. Every node in the cluster caches the freeze glock in the shared state. The new freeze_super hook allows the freezing node to grab this freeze glock in the exclusive state without first calling the vfs freeze_super function. All the nodes in the cluster see this lock change, and call the vfs freeze_super function. The vfs locking code guarantees that the nodes can't get stuck holding the glocks necessary to unfreeze the system. To unfreeze, the freezing node uses the new thaw_super hook to drop the freeze glock. Again, all the nodes notice this, reacquire the glock in shared mode and call the vfs thaw_super function. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
24972557 |
|
01-May-2014 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: remove transaction glock GFS2 has a transaction glock, which must be grabbed for every transaction, whose purpose is to deal with freezing the filesystem. Aside from this involving a large amount of locking, it is very easy to make the current fsfreeze code hang on unfreezing. This patch rewrites how gfs2 handles freezing the filesystem. The transaction glock is removed. In it's place is a freeze glock, which is cached (but not held) in a shared state by every node in the cluster when the filesystem is mounted. This lock only needs to be grabbed on freezing, and actions which need to be safe from freezing, like recovery. When a node wants to freeze the filesystem, it grabs this glock exclusively. When the freeze glock state changes on the nodes (either from shared to unlocked, or shared to exclusive), the filesystem does a special log flush. gfs2_log_flush() does all the work for flushing out the and shutting down the incore log, and then it tries to grab the freeze glock in a shared state again. Since the filesystem is stuck in gfs2_log_flush, no new transaction can start, and nothing can be written to disk. Unfreezing the filesytem simply involes dropping the freeze glock, allowing gfs2_log_flush() to grab and then release the shared lock, so it is cached for next time. However, in order for the unfreezing ioctl to occur, gfs2 needs to get a shared lock on the filesystem root directory inode to check permissions. If that glock has already been grabbed exclusively, fsfreeze will be unable to get the shared lock and unfreeze the filesystem. In order to allow the unfreeze, this patch makes gfs2 grab a shared lock on the filesystem root directory during the freeze, and hold it until it unfreezes the filesystem. The functions which need to grab a shared lock in order to allow the unfreeze ioctl to be issued now use the lock grabbed by the freeze code instead. The freeze and unfreeze code take care to make sure that this shared lock will not be dropped while another process is using it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
428fd95d |
|
12-Mar-2014 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Re-add a call to log_flush_wait when flushing the journal Upstream commit 34cc178 changed a line of code from calling function log_flush_commit to calling log_write_header. This had the effect of eliminating a call to function log_flush_wait. That causes the journal to skip over log headers, which results in multiple wrap points, which itself leads to infinite loops in journal replay, both in the kernel code and fsck.gfs2 code. This patch re-adds that call. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b1ab1e44 |
|
25-Feb-2014 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Remove extra "if" in gfs2_log_flush() By reordering some of the assignments in gfs2_log_flush() it is possible to remove one of the "if" statements as it can be merged with one higher up the function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
022ef4fe |
|
21-Feb-2014 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Move log buffer accounting to transaction Now we have a master transaction into which other transactions are merged, the accounting can be done using this master transaction. We no longer require the superblock fields which were being used for this function. In addition, this allows for a clean up in calc_reserved() making it rather easier understand. Also, by reducing the number of variables used to track the buffers being added and removed from the journal, a number of error checks are now no longer required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d69a3c65 |
|
21-Feb-2014 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Move log buffer lists into transaction Over time, we hope to be able to improve the concurrency available in the log code. This is one small step towards that, by moving the buffer lists from the super block, and into the transaction structure, so that each transaction builds its own buffer lists. At transaction commit time, the buffer lists are merged into the currently accumulating transaction. That transaction then is passed into the before and after commit functions at journal flush time. Thus there should be no change in overall behaviour yet. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
885bceca |
|
03-Feb-2014 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Plug on AIL flush When we do a flush of the AIL list, we are writing out what is likely to be a lot of small I/Os, which are possibly in an order which is not ideal performance-wise. Since this is done by calling filemap_fdatatwrite for each individual inode's address space there is no overall plugging going on. In addition to that, we do not always wait for AIL i/o when we flush it, so that it is possible for things to get left behind on the queue. By adding explicit plugging here, we reduce the chances of this being an issues. A quick test using the AIL flush tracepoint shows a small, but measurable improvement. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
9290a9a7 |
|
09-Dec-2013 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Fix use-after-free race when calling gfs2_remove_from_ail Function gfs2_remove_from_ail drops the reference on the bh via brelse. This patch fixes a race condition whereby bh is deferenced after the brelse when setting bd->bd_blkno = bh->b_blocknr; Under certain rare circumstances, bh might be gone or reused, and bd->bd_blkno is set to whatever that memory happens to be, which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails the test "bd->bd_blkno >= blkno" which causes it to never be freed. The end result is that the bd is never freed from the bufdata cache, which results in this error: slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5d054964 |
|
14-Jun-2013 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: aggressively issue revokes in gfs2_log_flush This patch looks at all the outstanding blocks in all the transactions on the log, and moves the completed ones to the ail2 list. Then it issues revokes for these blocks. This will hopefully speed things up in situations where there is a lot of contention for glocks, especially if they are acquired serially. revoke_lo_before_commit will issue at most one log block's full of these preemptive revokes. The amount of reserved log space that gfs2_log_reserve() ignores has been incremented to allow for this extra block. This patch also consolidates the common revoke instructions into one function, gfs2_add_revoke(). Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
16ca9412 |
|
05-Apr-2013 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: replace gfs2_ail structure with gfs2_trans In order to allow transactions and log flushes to happen at the same time, gfs2 needs to move the transaction accounting and active items list code into the gfs2_trans structure. As a first step toward this, this patch removes the gfs2_ail structure, and handles the active items list in the gfs_trans structure. This keeps gfs2 from allocating an ail structure on log flushes, and gives us a struture that can later be used to store the transaction accounting outside of the gfs2 superblock structure. With this patch, at the end of a transaction, gfs2 will add the gfs2_trans structure to the superblock if there is not one already. This structure now has the active items fields that were previously in gfs2_ail. This is not necessary in the case where the transaction was simply used to add revokes, since these are never written outside of the journal, and thus, don't need an active items list. Also, in order to make sure that the transaction structure is not removed while it's still in use by gfs2_trans_end, unlocking the sd_log_flush_lock has to happen slightly later in ending the transaction. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
45138990 |
|
28-Jan-2013 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use ->writepages for ordered writes Instead of using a list of buffers to write ahead of the journal flush, this now uses a list of inodes and calls ->writepages via filemap_fdatawrite() in order to achieve the same thing. For most use cases this results in a shorter ordered write list, as well as much larger i/os being issued. The ordered write list is sorted by inode number before writing in order to retain the disk block ordering between inodes as per the previous code. The previous ordered write code used to conflict in its assumptions about how to write out the disk blocks with mpage_writepages() so that with this updated version we can also use mpage_writepages() for GFS2's ordered write, writepages implementation. So we will also send larger i/os from writeback too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c0752aa7 |
|
30-Apr-2012 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: eliminate log elements and simplify This patch eliminates the gfs2_log_element data structure and rolls its two components into the gfs2_bufdata. This makes the code easier to understand and makes it easier to migrate to a rbtree to keep the list sorted. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
144a4c2f |
|
19-Apr-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Log code fixes This patch removes a log lock from around atomic operation where it is not needed, removes an unused variable, and also changes a void pointer used incorrectly to a struct page pointer. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c50b91c4 |
|
16-Apr-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Remove bd_list_tr This is another clean up in the logging code. This per-transaction list was largely unused. Its main function was to ensure that the number of buffers in a transaction was correct, however that counter was only used to check the number of buffers in the bd_list_tr, plus an assert at the end of each transaction. With the assert now changed to use the calculated buffer counts, we can remove both bd_list_tr and its associated counter. This should make the code easier to understand as well as shrinking a couple of structures. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e8c92ed7 |
|
16-Apr-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Clean up log write code path Prior to this patch, we have two ways of sending i/o to the log. One of those is used when we need to allocate both the data to be written itself and also a buffer head to submit it. This is done via sb_getblk and friends. This is used mostly for writing log headers. The other method is used when writing blocks which have some in-place counterpart. This is the case for all the metadata blocks which are journalled, and when journaled data is in use, for unescaped journalled data blocks. This patch replaces both of those two methods, and about half a dozen separate i/o submission points with a single i/o submission function. We also go direct to bio rather than using buffer heads, since this allows us to build i/o requests of the maximum size for the block device in question. It also reduces the memory required for flushing the log, which can be very useful in low memory situations. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
fdb76a42 |
|
02-Apr-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Drop "pull" argument from log_write_header() The "pull" argument to log_write_header() is only used for debug purposes and it is not really needed any more. There are other tests for this particular problem, so I think we can dispose of it in order to simplify the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
34cc1781 |
|
09-Mar-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Clean up log flush header writing We already send both a pre and post flush to the block device when writing a journal header. There is no need to wait for the previous I/O specifically when we do this, unless we've turned "barriers" off. As a side effect, this also cleans up the code path for flushing the journal and makes it more readable. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
08728f2d |
|
21-Feb-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Make bd_cmp() static Add missing static to bd_cmp() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
4a36d08d |
|
14-Feb-2012 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Sort the ordered write list This patch sorts the ordered write list for GFS2 writes. This increases the throughput for simultaneous writes. For example, if you have ten processes, all doing: dd if=/dev/zero of=/mnt/gfs2/fileX on different files, the throughput will be much better. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
47ac5537 |
|
03-Feb-2012 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Move two functions from log.c to lops.c gfs2_log_get_buf() and gfs2_log_fake_buf() are both used only in lops.c, so move them next to their callers and they can then become static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
a0acae0e |
|
21-Nov-2011 |
Tejun Heo <tj@kernel.org> |
freezer: unexport refrigerator() and update try_to_freeze() slightly There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
|
#
20ed0535 |
|
31-Oct-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix up REQ flags Christoph has split up REQ_PRIO from REQ_META. That means that we can drop REQ_PRIO from places where is it not needed. I'm not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO makes any kind of sense, anyway. In addition, I've added REQ_META to one place in the code where it was missing. REQ_PRIO has been left for read/writes triggered by glock acquisition and writeback only. We can adjust it again if required, but these are the most important points from a performance perspective. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
|
#
65299a3b |
|
23-Aug-2011 |
Christoph Hellwig <hch@infradead.org> |
block: separate priority boosting from REQ_META Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
380f7c65 |
|
14-Jul-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Resolve inode eviction and ail list interaction bug This patch contains a few misc fixes which resolve a recently reported issue. This patch has been a real team effort and has received a lot of testing. The first issue is that the ail lock needs to be held over a few more operations. The lock thats added into gfs2_releasepage() may possibly be a candidate for replacing with RCU at some future point, but at this stage we've gone for the obvious fix. The second issue is that gfs2_write_inode() can end up calling a glock recursively when called from gfs2_evict_inode() via the syncing code, so it needs a guard added. The third issue is that we either need to not truncate the metadata pages of inodes which have zero link count, but which we cannot deallocate due to them still being in use by other nodes, or we need to ensure that those pages have all made it through the journal and ail lists first. This patch takes the former approach, but the latter has also been tested and there is nothing to choose between them performance-wise. So again, we could revise that decision in the future. Also, the inode eviction process is now better documented. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Bob Peterson <rpeterso@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Reported-by: Barry J. Marson <bmarson@redhat.com> Reported-by: David Teigland <teigland@redhat.com>
|
#
26b06a69 |
|
21-May-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Wait properly when flushing the ail list The ail flush code has always relied upon log flushing to prevent it from spinning needlessly. This fixes it to wait on the last I/O request submitted (we don't need to wait for all of it) instead of either spinning with io_schedule or sleeping. As a result cpu usage of gfs2_logd is much reduced with certain workloads. Reported-by: Abhijith Das <adas@redhat.com> Tested-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
4f1de018 |
|
26-Apr-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Fix ail list traversal In the recent patches to update the AIL list code, I managed to forget that the ail list lock got dropped, even though I added a comment specifically to remind myself :( Reported-by: Barry Marson <bmarson@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c83ae9ca |
|
18-Apr-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add an AIL writeback tracepoint Add a tracepoint for monitoring writeback of the AIL. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
4667a0ec |
|
18-Apr-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Make writeback more responsive to system conditions This patch adds writeback_control to writing back the AIL list. This means that we can then take advantage of the information we get in ->write_inode() in order to set off some pre-emptive writeback. In addition, the AIL code is cleaned up a bit to make it a bit simpler to understand. There is still more which can usefully be done in this area, but this is a good start at least. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5ac048bb |
|
30-Mar-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Use filemap_fdatawrite() to write back the AIL In order to ensure that the mapping stats (and thus the bdi) are correctly updated, this patch changes the AIL writeback to use the filemap_datawrite function. This helps prevent stalls in balance_dirty_pages() due to large amounts of dirty metadata when there is little or no dirty data around. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c618e87a |
|
13-Mar-2011 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Update to AIL list locking The previous patch missed a couple of places where the AIL list needed locking, so this fixes up those places, plus a comment is corrected too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Chinner <dchinner@redhat.com>
|
#
d6a079e8 |
|
11-Mar-2011 |
Dave Chinner <dchinner@redhat.com> |
GFS2: introduce AIL lock The log lock is currently used to protect the AIL lists and the movements of buffers into and out of them. The lists are self contained and no log specific items outside the lists are accessed when starting or emptying the AIL lists. Hence the operation of the AIL does not require the protection of the log lock so split them out into a new AIL specific lock to reduce the amount of traffic on the log lock. This will also reduce the amount of serialisation that occurs when the gfs2_logd pushes on the AIL to move it forward. This reduces the impact of log pushing on sequential write throughput. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
721a9602 |
|
09-Mar-2011 |
Jens Axboe <jaxboe@fusionio.com> |
block: kill off REQ_UNPLUG With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
5f487490 |
|
09-Sep-2010 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: gfs2_logd should be using interruptible waits Looks like this crept in, in a recent update. Reported-by: Krzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f1e4d518 |
|
18-Aug-2010 |
Christoph Hellwig <hch@infradead.org> |
gfs2: replace barriers with explicit flush / FUA usage Switch to the WRITE_FLUSH_FUA flag for log writes, remove the EOPNOTSUPP detection for barriers and stop setting the barrier flag for discards. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
7b6d91da |
|
07-Aug-2010 |
Christoph Hellwig <hch@lst.de> |
block: unify flags for struct bio and struct request Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
41f2df62 |
|
17-Jun-2010 |
Christoph Hellwig <hch@lst.de> |
block: BARRIER request should imply SYNC A barrier request should by defintion have priority in get_request and let the queue be unplugged immediately as it's blocking all forward progress due to the queue draining. Most filesystems already get this implicitly by the way how submit_bh treats the buffer_ordered flag, and gfs2 sets it explicitly. But btrfs and XFS are still forgetting to set the flag, as is blkdev_issue_flush and some places in DM/MD. For XFS on metadata heavy workloads this gives a consistent speedup in the 2-3% range. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
ed4878e8 |
|
20-May-2010 |
Bob Peterson <rpeterso@redhat.com> |
GFS2: Rework reclaiming unlinked dinodes The previous patch I wrote for reclaiming unlinked dinodes had some shortcomings and did not prevent all hangs. This version is much cleaner and more logical, and has passed very difficult testing. Sorry for the churn. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
913a71d2 |
|
06-May-2010 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add some useful messages The following patch adds a message to indicate when barriers have been disabled due to a block device which doesn't support them. You could already tell this via the mount options in /proc/mounts, but all the other filesystems also log a message at the same time. Also, the same mechanisms are used to indicate when the lock demote interface has been used (only ever used for debugging) which is a request from our support team. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5e687eac |
|
04-May-2010 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: Various gfs2_logd improvements This patch contains various tweaks to how log flushes and active item writeback work. gfs2_logd is now managed by a waitqueue, and gfs2_log_reseve now waits for gfs2_logd to do the log flushing. Multiple functions were rewritten to remove the need to call gfs2_log_lock(). Instead of using one test to see if gfs2_logd had work to do, there are now seperate tests to check if there are two many buffers in the incore log or if there are two many items on the active items list. This patch is a port of a patch Steve Whitehouse wrote about a year ago, with some minor changes. Since gfs2_ail1_start always submits all the active items, it no longer needs to keep track of the first ai submitted, so this has been removed. In gfs2_log_reserve(), the order of the calls to prepare_to_wait_exclusive() and wake_up() when firing off the logd thread has been switched. If it called wake_up first there was a small window for a race, where logd could run and return before gfs2_log_reserve was ready to get woken up. If gfs2_logd ran, but did not free up enough blocks, gfs2_log_reserve() would be left waiting for gfs2_logd to eventualy run because it timed out. Finally, gt_logd_secs, which controls how long to wait before gfs2_logd times out, and flushes the log, can now be set on mount with ar_commit. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
2e95e3f6 |
|
10-Mar-2010 |
Benjamin Marzinski <bmarzins@redhat.com> |
GFS2: Allow the number of committed revokes to temporarily be negative GFS2 tracks the number of revokes and unrevokes that are part of committed transactions via sd_log_commited_revoke. It is possible for one process to add revokes during its transaction, while another process unrevokes them during its transaction. If the second process finishes its transaction first, sd_log_commited_revoke will be decremented by the number of unrevokes that the second process did, without first being incremented by the number of revokes the first process did. This is fine, since all started transactions must be completed before the journal can be flushed. However, sd_log_commited_revoke is an unsigned integer, and log_refund() causes an assertion failure if it would go negative at the end of a transaction. This patch makes sd_log_commited_revoke a signed integer and allows it to go negative. __gfs2_log_flush() still checks that it mataches the actual number of revokes. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
0ab7d13f |
|
06-Nov-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Tag all metadata with jid There are two spare field in the header common to all GFS2 metadata. One is just the right size to fit a journal id in it, and this patch updates the journal code so that each time a metadata block is modified, we tag it with the journal id of the node which is performing the modification. The reason for this is that it should make it much easier to debug issues which arise if we can tell which node was the last to modify a particular metadata block. Since the field is updated before the block is written into the journal, each journal should only contain metadata which is tagged with its own journal id. The one exception to this is the journal header block, which might have a different node's id in it, if that journal was recovered by another node in the cluster. Thus each journal will contain a record of which nodes recovered it, via the journal header. The other field in the metadata header could potentially be used to hold information about what kind of operation was performed, but for the time being we just zero it on each transaction so that if we use it for that in future, we'll know that the information (where it exists) is reliable. I did consider using the other field to hold the journal sequence number, however since in GFS2's journaling we write the modified data into the journal and not the original data, this gives no information as to what action caused the modification, so I think we can probably come up with a better use for those 64 bits in the future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
63997775 |
|
12-Jun-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Add tracepoints This patch adds the ability to trace various aspects of the GFS2 filesystem. The trace points are divided into three groups, glocks, logging and bmap. These points have been chosen because they allow inspection of the major internal functions of GFS2 and they are also generic enough that they are unlikely to need any major changes as the filesystem evolves. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b7d245de |
|
27-Apr-2009 |
Christoph Hellwig <hch@infradead.org> |
gfs2: remove ->write_super and stop maintaining ->s_dirt Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c969f58c |
|
07-Apr-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Update the rw flags After Jens recent updates: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a1f242524c3c1f5d40f1c9c343427e34d1aadd6e et al. this is a patch to bring gfs2 uptodate with the core code. Also I've managed to squash another call to ll_rw_block() along the way. There is still one part of the GFS2 I/O paths which are not correctly annotated and that is due to the sharing of the writeback code between the data and metadata address spaces. I would like to change that too, but this patch is still worth doing on its own, I think. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f057f6cd |
|
12-Jan-2009 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Merge lock_dlm module into GFS2 This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
254db57f |
|
26-Sep-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
GFS2: Support for I/O barriers This patch adds barrier support to GFS2. There is not a lot of change really... we just add the barrier flag when we write journal header blocks. If the underlying device refuses to support them, we fall back to the previous way of doing things (wait for the I/O and hope) since there is nothing else we can do. There is no user configuration, barriers will always be on unless the device refuses to support them. This seems a reasonable solution to me since this is a correctness issue. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
2d81afb8 |
|
29-May-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
[GFS2] trivial sparse lock annotations Annotate the &sdp->sd_log_lock. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
62be1f71 |
|
17-Apr-2008 |
Roel Kluin <12o3l@tiscali.nl> |
[GFS2] fix assertion in log_refund() since unsigned, unused >= 0 is always true. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d0109bfa |
|
28-Jan-2008 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] Only do lo_incore_commit once This patch is performance related. When we're doing a log flush, I noticed we were calling buf_lo_incore_commit twice: once for data bufs and once for metadata bufs. Since this is the same function and does the same thing in both cases, there should be no reason to call it twice. Since we only need to call it once, we can also make it faster by removing it from the generic "lops" code and making it a stand-along static function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ac39aadd |
|
10-Jan-2008 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix assert in log code Although the values were all being calculated correctly, there was a race in the assert due to the way it was using atomic variables. This changes the value we assert on so that we get the same effect by testing a different variable. This prevents the assert triggering when it shouldn't. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ff91cc9b |
|
14-Dec-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix log block mapper A missing offset in the calculation. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
da6dd40d |
|
11-Dec-2007 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] Journal extent mapping This patch saves a little time when gfs2 writes to the journals by keeping a mapping between logical and physical blocks on disk. That's better than constantly looking up indirect pointers in buffers, when the journals are several levels of indirection (which they typically are). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e9e1ef2b |
|
10-Dec-2007 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] Remove function gfs2_get_block This patch is just a cleanup. Function gfs2_get_block() just calls function gfs2_block_map reversing the last two parameters. By reversing the parameters, gfs2_block_map() may be called directly and function gfs2_get_block may be eliminated altogether. Since this function is done for every block operation, this streamlines the code and makes it a little bit more efficient. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1a2781cf |
|
16-Nov-2007 |
Fabio Massimo Di Nitto <fabbione@ubuntu.com> |
[GFS2] Fix runtime issue with UP kernels The issue is indeed UP vs SMP and it is totally random. spin_is_locked() is a bad assertion because there is no correct answer on UP. on UP spin_is_locked() has to return either one value or another, always. This means that in my setup I am lucky enough to trigger the issue and your you are lucky enough not to. the patch in attachment removes the bogus calls to BUG_ON and according to David (in CC and thanks for the long explanation on the problem) we can rely upon things like lockdep to find problem that might be trying to catch. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e35b9211 |
|
09-Nov-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Don't periodically update the jindex We only care about the content of the jindex in two cases, one is when we mount the fs and the other is when we need to recover another journal. In both cases we have to update the jindex anyway, so there is no point in updating it periodically between times, so this removes it to simplify gfs2_logd. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ec69b188 |
|
09-Nov-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Move gfs2_logd into log.c This means that we can mark gfs2_ail1_empty static and prepares the way for further changes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
fd041f0b |
|
08-Nov-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Use atomic_t for journal free blocks counter This patch changes the counter which keeps track of the free blocks in the journal to an atomic_t in preparation for the following patch which will update the log reservation code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
2bcd610d |
|
08-Nov-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Don't add glocks to the journal The only reason for adding glocks to the journal was to keep track of which locks required a log flush prior to release. We add a flag to the glock to allow this check to be made in a simpler way. This reduces the size of a glock (by 12 bytes on i386, 24 on x86_64) and means that we can avoid extra work during the journal flush. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b8e7cbb6 |
|
17-Oct-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Add writepages for GFS2 jdata This patch resolves a lock ordering issue where we had been getting a transaction lock in the wrong order with respect to the page lock. By using writepages rather than just writepage, it is then possible to start a transaction before locking the page, and thus matching the locking order elsewhere in the code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f91a0d3e |
|
15-Oct-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove useless i_cache from inodes The i_cache was designed to keep references to the indirect blocks used during block mapping so that they didn't have to be looked up continually. The idea failed because there are too many places where the i_cache needs to be freed, and this has in the past been the cause of many bugs. In addition there was no performance benefit being gained since the disk blocks in question were cached anyway. So this patch removes it in order to simplify the code to prepare for other changes which would otherwise have had to add further support for this feature. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5a60c532 |
|
26-Sep-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Get superblock a different way The mapping may be NULL by the time the I/O has completed, so we now get the superblock by a different route (via the bd and glock) to avoid this problem. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Wendy Cheng <wcheng@redhat.com>
|
#
16615be1 |
|
17-Sep-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up journaled data writing This patch cleans up the code for writing journaled data into the log. It also removes the need to allocate a small "tag" structure for each block written into the log. Instead we just keep count of the outstanding I/O so that we can be sure that its all been written at the correct time. Another result of this patch is that a number of ll_rw_block() calls have become submit_bh() calls, closing some races at the same time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1ad38c43 |
|
03-Sep-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up gfs2_trans_add_revoke() The following alters gfs2_trans_add_revoke() to take a struct gfs2_bufdata as an argument. This eliminates the memory allocation which was previously required by making use of the already existing struct gfs2_bufdata. It makes some sanity checks to ensure that the gfs2_bufdata has been removed from all the lists before its recycled as a revoke structure. This saves one memory allocation and one free per revoke structure. Also as a result, and to simplify the locking, since there is no longer any blocking code in gfs2_trans_add_revoke() we must hold the log lock whenever this function is called. This reduces the amount of times we take and unlock the log lock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
d7b616e2 |
|
02-Sep-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up ordered write code The following patch removes the ordered write processing from databuf_lo_before_commit() and moves it to log.c. This has the effect of greatly simplyfying databuf_lo_before_commit() and well as potentially making the ordered write code more efficient. As a side effect of this, its now possible to remove ordered buffers from the ordered buffer list at any time, so we now make use of this in invalidatepage and releasepage to ensure timely release of these buffers. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
1e1a3d03 |
|
27-Aug-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Introduce gfs2_remove_from_ail This collects together the operations required to remove a gfs2_bufdata from the ail lists. Its only called from two places to start with, but expect to see more of this function in future. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bb3b0e3d |
|
16-Aug-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Clean up invalidatepage/releasepage This patch fixes some bugs relating to journaled data files by cleaning up the gfs2_invalidatepage() and gfs2_releasepage() functions. We now never block during gfs2_releasepage(), instead we always either release or refuse to release depending on the status of the buffers. This fixes Red Hat bugzillas #248969 and #252392. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
|
#
0f8468c8 |
|
25-Jul-2007 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] Detach buf data during in-place writeback This is patch 5 of 5 for bug #248176 Metadata corruption was occurring because page references weren't being removed in all cases. I previously added a function called detach_bufdata, but I discovered there already WAS a function out there to do the job. It's called gfs2_meta_cache_flush. So I added a call to that to remove the page references. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
693ddeab |
|
24-Jul-2007 |
Bob Peterson <rpeterso@redhat.com> |
[GFS2] Revert part of earlier log.c changes This is patch 2 of 5 for bug #248176. The list_move code previously concocted in log.c for bug #238162 (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23) never runs as bh can now never be NULL at this point. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
a0a24741 |
|
09-Jul-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Small fixes to logging code This reverts part of an earlier patch which tried to reclaim gfs2_bufdata structures too early and resulted in a "use after free" case (this bit from me). Also a change to not write out log headers unless we really need to (in the case of flushing nothing we don't need a header) from Bob. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
#
2332c443 |
|
18-Jun-2007 |
Robert Peterson <rpeterso@redhat.com> |
[GFS2] assertion failure after writing to journaled file, umount This patch passes all my nasty tests that were causing the code to fail under one circumstance or another. Here is a complete summary of all changes from today's git tree, in order of appearance: 1. There are now separate variables for metadata buffer accounting. 2. Variable sd_log_num_hdrs is no longer needed, since the header accounting is taken care of by the reserve/refund sequence. 3. Fixed a tiny grammatical problem in a comment. 4. Added a new function "calc_reserved" to calculate the reserved log space. This isn't entirely necessary, but it has two benefits: First, it simplifies the gfs2_log_refund function greatly. Second, it allows for easier debugging because I could sprinkle the code with calls to this function to make sure the accounting is proper (by adding asserts and printks) at strategic point of the code. 5. In log_pull_tail there apparently was a kludge to fix up the accounting based on a "pull" parameter. The buffer accounting is now done properly, so the kludge was removed. 6. File sync operations were making a call to gfs2_log_flush that writes another journal header. Since that header was unplanned for (reserved) by the reserve/refund sequence, the free space had to be decremented so that when log_pull_tail gets called, the free space is be adjusted properly. (Did I hear you call that a kludge? well, maybe, but a lot more justifiable than the one I removed). 7. In the gfs2_log_shutdown code, it optionally syncs the log by specifying the PULL parameter to log_write_header. I'm not sure this is necessary anymore. It just seems to me there could be cases where shutdown is called while there are outstanding log buffers. 8. In the (data)buf_lo_before_commit functions, I changed some offset values from being calculated on the fly to being constants. That simplified some code and we might as well let the compiler do the calculation once rather than redoing those cycles at run time. 9. This version has my rewritten databuf_lo_add function. This version is much more like its predecessor, buf_lo_add, which makes it easier to understand. Again, this might not be necessary, but it seems as if this one works as well as the previous one, maybe even better, so I decided to leave it in. 10. In databuf_lo_before_commit, a previous data corruption problem was caused by going off the end of the buffer. The proper solution is to have the proper limit in place, rather than stopping earlier. (Thus my previous attempt to fix it is wrong). If you don't wrap the buffer, you're stopping too early and that causes more log buffer accounting problems. 11. In lops.h there are two new (previously mentioned) constants for figuring out the data offset for the journal buffers. 12. There are also two new functions, buf_limit and databuf_limit to calculate how many entries will fit in the buffer. 13. In function gfs2_meta_wipe, it needs to distinguish between pinned metadata buffers and journaled data buffers for proper journal buffer accounting. It can't use the JDATA gfs2_inode flag because it's sometimes passed the "real" inode and sometimes the "metadata inode" and the inode flags will be random bits in a metadata gfs2_inode. It needs to base its decision on which was passed in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
8fb68595 |
|
12-Jun-2007 |
Robert Peterson <rpeterso@redhat.com> |
[GFS2] Journaled file write/unstuff bug This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ddf4b426 |
|
01-Jun-2007 |
Benjamin Marzinski <bmarzins@redhat.com> |
[GFS2] fix jdata issues This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
89918647 |
|
01-Jun-2007 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Make the log reserved blocks depend on block size The number of blocks which we reserve in the log at the start of each transaction needs to depends upon the block size since the overhead is related to the number of "pointers" which can be fitted into a single block. This relates to Red Hat bz #240435 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
aed3255f |
|
27-Nov-2006 |
Ryusuke Konishi <ryusuke@osrg.net> |
[GFS2] fs/gfs2/log.c:log_bmap() fix printk format warning Fix a printk format warning in fs/gfs2/log.c: fs/gfs2/log.c:322: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'sector_t' Signed-off-by: Ryusuke Konishi <ryusuke@osrg.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
a25311c8 |
|
23-Nov-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Move gfs2_meta_syncfs() into log.c By moving gfs2_meta_syncfs() into log.c, gfs2_ail1_start() can be made static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b004157a |
|
23-Nov-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix journal flush problem This fixes a bug which resulted in poor performance due to flushing the journal too often. The code path in question was via the inode_go_sync() function in glops.c. The solution is not to flush the journal immediately when inodes are ejected from memory, but batch up the work for glockd to deal with later on. This means that glocks may now live on beyond the end of the lifetime of their inodes (but not very much longer in the normal case). Also fixed in this patch is a bug (which was hidden by the bug mentioned above) in calculation of the number of free journal blocks. The gfs2_logd process has been altered to be more responsive to the journal filling up. We now wake it up when the number of uncommitted journal blocks has reached the threshold level rather than trying to flush directly at the end of each transaction. This again means doing fewer, but larger, log flushes in general. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
23591256 |
|
13-Oct-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix bmap to map extents properly This fix means that bmap will map extents of the length requested by the VFS rather than guessing at it, or just mapping one block at a time. The other callers of gfs2_block_map are audited to ensure they send the correct max extent lengths (i.e. set bh->b_size correctly). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
fe1a698f |
|
11-Oct-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix bug where lock not held The log lock needs to be held when manipulating the counter for the number of free journal blocks. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ddacfaf7 |
|
03-Oct-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Move logging code into log.c (mostly) This moves the logging code from meta_io.c into log.c and glops.c. As a result the routines can now be static and all the logging code is together in log.c, leaving meta_io.c with just metadata i/o code in it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
74669416 |
|
19-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Use list_for_each_entry_safe_reverse in gfs2_ail1_start() This is an attempt to fix Red Hat bz 204364. I don't hit it all the time, but with these changes, running postmark which used to trigger it on a regular basis no longer appears to. So I'm not saying that its 100% certain that its fixed, but it does look promising at the moment. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7d308590 |
|
18-Sep-2006 |
Fabio Massimo Di Nitto <fabbione@ubuntu.com> |
[GFS2] Export lm_interface to kernel headers lm_interface.h has a few out of the tree clients such as GFS1 and userland tools. Right now, these clients keeps a copy of the file in their build tree that can go out of sync. Move lm_interface.h to include/linux, export it to userland and clean up fs/gfs2 to use the new location. Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7a6bbacb |
|
18-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Map multiple blocks at once where possible This is a tidy up of the GFS2 bmap code. The main change is that the bh is passed to gfs2_block_map allowing the flags to be set directly rather than having to repeat that code several times in ops_address.c. At the same time, the extent mapping code from gfs2_extent_map has been moved into gfs2_block_map. This allows all calls to gfs2_block_map to map extents in the case that no allocation is taking place. As a result reads and non-allocating writes should be faster. A quick test with postmark appears to support this. There is a limit on the number of blocks mapped in a single bmap call in that it will only ever map blocks which are pointed to from a single pointer block. So in other words, it will never try to do additional i/o in order to satisfy read-ahead. The maximum number of blocks is thus somewhat less than 512 (the GFS2 4k block size minus the header divided by sizeof(u64)). I've further limited the mapping of "normal" blocks to 32 blocks (to avoid extra work) since readpages() will currently read a maximum of 32 blocks ahead (128k). Some further work will probably be needed to set a suitable value for DIO as well, but for now thats left at the maximum 512 (see ops_address.c:gfs2_get_block_direct). There is probably a lot more that can be done to improve bmap for GFS2, but this is a good first step. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
faa31ce8 |
|
13-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Tidy up log.c Based upon previous feedback from lkml and also removing some commented out debugging which is no longer needed. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
c5392124 |
|
05-Sep-2006 |
Jan Engelhardt <jengelh@linux01.gwdg.de> |
[GFS2] More style changes Remove redundant brackets Signed-off-by: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
a67cdbd4 |
|
05-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Style changes in logging code As per Jan Engelhardt's comments, removed some unused code and removed some brackets which were not required. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
cd915493 |
|
03-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Change all types to uX style This makes all fixed size types have consistent names. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e9fc2aa0 |
|
01-Sep-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Update copyright, tidy up incore.h As per comments from Jan Engelhardt <jengelh@linux01.gwdg.de> this updates the copyright message to say "version" in full rather than "v.2". Also incore.h has been updated to remove forward structure declarations which are not required. The gfs2_quota_lvb structure has now had endianess annotations added to it. Also quota.c has been updated so that we now store the lvb data locally in endian independant format to avoid needing a structure in host endianess too. As a result the endianess conversions are done as required at various points and thus the conversion routines in lvb.[ch] are no longer required. I've moved the one remaining constant in lvb.h thats used into lm.h and removed the unused lvb.[ch]. I have not changed the HIF_ constants. That is left to a later patch which I hope will unify the gh_flags and gh_iflags fields of the struct gfs2_holder. Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5dc39fe6 |
|
24-Aug-2006 |
Benjamin Marzinski <bmarzins@redhat.com> |
[GFS2] Fix journal off-by-one error log_refund() incorrectly assumed that if a transaction had been touched, it always committed buffers to the incore log. Thus, when you got around to flushing the log, you would need one more block than you committed, to account for the header. So it automatically set reserved to 1, which had the effect of making sdp->sd_log_blks_reserved one greater when you got to gfs2_log_flush(). However, if you don't actually commit anything to the incore log between flushes, you don't need the header, because you aren't writing anything out. With this patch, log_refund() only increments reservered to account for the header if something has been committed since the last flush. Signed-off-by: Benjamin E. Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
59a1cc6b |
|
04-Aug-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix lock ordering bug in page fault path Mmapped files were able to trigger a lock ordering bug. Private maps do not need to take the glock so early on. Shared maps do unfortunately, however we can get around that by adding a flag into the flags for the struct gfs2_file. This only works because we are taking an exclusive lock at this point, so we know that nobody else can be racing with us. Fixes Red Hat bugzilla: #201196 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e0f2bf78 |
|
17-Jul-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix endian conversion bug Fix an endian coversion bug in log.c spotted by Kevin Anderson. Cc: Kevin Anderson <kanderso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
fd4de2d4 |
|
05-Jul-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Add cast for printk Cast a uint64_t to unsigned long long for a printk. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
feaa7bba |
|
14-Jun-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix unlinked file handling This patch fixes the way we have been dealing with unlinked, but still open files. It removes all limits (other than memory for inodes, as per every other filesystem) on numbers of these which we can support on GFS2. It also means that (like other fs) its the responsibility of the last process to close the file to deallocate the storage, rather than the person who did the unlinking. Note that with GFS2, those two events might take place on different nodes. Also there are a number of other changes: o We use the Linux inode subsystem as it was intended to be used, wrt allocating GFS2 inodes o The Linux inode cache is now the point which we use for local enforcement of only holding one copy of the inode in core at once (previous to this we used the glock layer). o We no longer use the unlinked "special" file. We just ignore it completely. This makes unlinking more efficient. o We now use the 4th block allocation state. The previously unused state is used to track unlinked but still open inodes. o gfs2_inoded is no longer needed o Several fields are now no longer needed (and removed) from the in core struct gfs2_inode o Several fields are no longer needed (and removed) from the in core superblock There are a number of future possible optimisations and clean ups which have been made possible by this patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
3a8a9a10 |
|
18-May-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Update copyright date to 2006 Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
bd896801 |
|
18-May-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Remove semaphore.h from C files We no longer use semaphores, everything has been converted to mutex or rwsem, so we don't need to include this header any more. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
fd88de56 |
|
05-May-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Readpages support This adds readpages support (and also corrects a small bug in the readpage error path at the same time). Hopefully this will improve performance by allowing GFS to submit larger lumps of I/O at a time. In order to simplify the setting of BH_Boundary, it currently gets set when we hit the end of a indirect pointer block. There is always a boundary at this point with the current allocation code. It doesn't get all the boundaries right though, so there is still room for improvement in this. See comments in fs/gfs2/ops_address.c for further information about readpages with GFS2. Signed-off-by: Steven Whitehouse
|
#
a74604be |
|
21-Apr-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] sem -> mutex conversion in locking.c Convert a semaphore to a mutex in locking.c and also tidy up one or two loose ends. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
190562bd |
|
20-Apr-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix a bug: scheduling under a spinlock At some stage, a mutex was added to gfs2_glock_put() without checking all its call sites. Two of them were called from under a spinlock causing random delays at various points and crashes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f4154ea0 |
|
11-Apr-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Update journal accounting code. A small update to the journaling code to change the way that the "extra" blocks are accounted for in the journal. These are used at a rate of one per 503 metadata blocks or one per 251 journaled data blocks (or just one if the total number of journaled blocks in the transaction is smaller). Since we are using them at two different rates the old method of accounting for them no longer works and we count them up as required. Since the "per transaction" accounting can't handle this (there is no fixed number of header blocks per transaction) we have to account for it in the general journal code. We now require that each transaction reserves more blocks than it actually needs to take account of the possible extra blocks. Also a final fix to dir.c to ensure that all ref counts are handled correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
ed386507 |
|
07-Apr-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Finally get ref counting correct The last patch missed some other instances of incorrect ref counting, this fixes all of those too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b09e593d |
|
07-Apr-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix a ref count bug and other clean ups This fixes a ref count bug that sometimes showed up a umount time (causing it to hang) but it otherwise mostly harmless. At the same time there are some clean ups including making the log operations structures const, moving a memory allocation so that its not done in the fast path of checking to see if there is an outstanding transaction related to a particular glock. Removes the sd_log_wrap varaible which was updated, but never actually used anywhere. Updates the gfs2 ioctl() to run without the kernel lock (which it never needed anyway). Removes the "invalidate inodes" loop from GFS2's put_super routine. This is done in kill super anyway so we don't need to do it here. The loop was also bogus in that if there are any inodes "stuck" at this point its a bug and we need to know about it rather than hide it by hanging forever. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
e3167ded |
|
30-Mar-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS] Fix bug in endian conversion for metadata header In some cases 16 bit functions were being used rather than 32 bit functions. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
484adff8 |
|
29-Mar-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Update locking in log.c Replace the lock_for_trans()/lock_for_flush() functions with an rwsem. In fact the sd_log_flush_lock becomes an rwsem (the write part of it) and is extended slightly to cover everything that the lock_for_flush() used to cover. The read part of the lock is instead of lock_for_trans(). This corrects the races in the original code and reduces the code size. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
71b86f56 |
|
28-Mar-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Further updates to dir and logging code This reduces the size of the directory code by about 3k and gets readdir() to use the functions which were introduced in the previous directory code update. Two memory allocations are merged into one. Eliminates zeroing of some buffers which were never used before they were initialised by other data. There is still scope for further improvement in the directory code. On the logging side, a hand created mutex has been replaced by a standard Linux mutex in the log allocation code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b4dc7291 |
|
01-Mar-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix some bugs Fix a bug I introduced earlier with a kfree() and usage of a structure in the wrong order. Also try and get the counts of the journaled data buffers "more correct". Still some work to do in this area though. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
5c676f6d |
|
27-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Macros removal in gfs2.h As suggested by Pekka Enberg <penberg@cs.helsinki.fi>. The DIV_RU macro is renamed DIV_ROUND_UP and and moved to kernel.h The other macros are gone from gfs2.h as (although not requested by Pekka Enberg) are a number of included header file which are now included individually. The inode number comparison function is now an inline function. The DT2IF and IF2DT may be addressed in a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
568f4c96 |
|
26-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] 80 Column audit of GFS2 Requested by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
6a6b3d01 |
|
23-Feb-2006 |
David Teigland <teigland@redhat.com> |
[GFS2] Patch to remove stats gathering from GFS2 Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
f55ab26a |
|
20-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Use mutices rather than semaphores As well as a number of minor bug fixes, this patch changes GFS to use mutices rather than semaphores. This results in better information in case there are any locking problems. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
7359a19c |
|
12-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Fix for root inode ref count bug Umount is now working correctly again. The bug was due to not getting an extra ref count when mounting the fs. We should have bumped it by two (once for the internal pointer to the root inode from the super block and once for the inode hanging off the dcache entry for root). Also this patch tidys up the code dealing with looking up and creating inodes. We now pass Linux inodes (with gfs2_inodes attached) rather than the other way around and this reduces code duplication in various places. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
18ec7d5c |
|
08-Feb-2006 |
Steven Whitehouse <swhiteho@redhat.com> |
[GFS2] Make journaled data files identical to normal files on disk This is a very large patch, with a few still to be resolved issues so you might want to check out the previous head of the tree since this is known to be unstable. Fixes for the various bugs will be forthcoming shortly. This patch removes the special data format which has been used up till now for journaled data files. Directories still retain the old format so that they will remain on disk compatible with earlier releases. As a result you can now do the following with journaled data files: 1) mmap them 2) export them over NFS 3) convert to/from normal files whenever you want to (the zero length restriction is gone) In addition the level at which GFS' locking is done has changed for all files (since they all now use the page cache) such that the locking is done at the page cache level rather than the level of the fs operations. This should mean that things like loopback mounts and other things which touch the page cache directly should now work. Current known issues: 1. There is a lock mode inversion problem related to the resource group hold function which needs to be resolved. 2. Any significant amount of I/O causes an oops with an offset of hex 320 (NULL pointer dereference) which appears to be related to a journaled data buffer appearing on a list where it shouldn't be. 3. Direct I/O writes are disabled for the time being (will reappear later) 4. There is probably a deadlock between the page lock and GFS' locks under certain combinations of mmap and fs operation I/O. 5. Issue relating to ref counting on internally used inodes causes a hang on umount (discovered before this patch, and not fixed by it) 6. One part of the directory metadata is different from GFS1 and will need to be resolved before next release. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
4f3df041 |
|
18-Jan-2006 |
Steven Whitehouse <steve@chygwyn.com> |
[GFS2] Change memory allocations to GFP_NOFS I'd like to be rid of these memory allocations entirely so far as is possible. For the moment though, mark them GFP_NOFS to make them less harmful. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
#
b3b94faa |
|
16-Jan-2006 |
David Teigland <teigland@redhat.com> |
[GFS2] The core of GFS2 This patch contains all the core files for GFS2. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|