#
21e308e6 |
|
22-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the xfs_buftarg_t typedef Switch the few remaining holdouts to the struct version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
0b3a76e9 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: use GFP_KERNEL in pure transaction contexts When running in a transaction context, memory allocations are scoped to GFP_NOFS. Hence we don't need to use GFP_NOFS contexts in pure transaction context allocations - GFP_KERNEL will automatically get converted to GFP_NOFS as appropriate. Go through the code and convert all the obvious GFP_NOFS allocations in transaction context to use GFP_KERNEL. This further reduces the explicit use of GFP_NOFS in XFS. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
d4c75a1b |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert remaining kmem_free() to kfree() The remaining callers of kmem_free() are freeing heap memory, so we can convert them directly to kfree() and get rid of kmem_free() altogether. This conversion was done with: $ for f in `git grep -l kmem_free fs/xfs`; do > sed -i s/kmem_free/kfree/ $f > done $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
49292576 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert kmem_free() for kvmalloc users to kvfree() Start getting rid of kmem_free() by converting all the cases where memory can come from vmalloc interfaces to calling kvfree() directly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
10634530 |
|
15-Jan-2024 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert kmem_zalloc() to kzalloc() There's no reason to keep the kmem_zalloc() around anymore, it's just a thin wrapper around kmalloc(), so lets get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
03f7767c |
|
22-Nov-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: use xfs_defer_pending objects to recover intent items One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed. This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done. Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function. However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed. This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
471de203 |
|
30-Oct-2023 |
Leah Rumancik <leah.rumancik@gmail.com> |
xfs: up(ic_sema) if flushing data device fails We flush the data device cache before we issue external log IO. If the flush fails, we shut down the log immediately and return. However, the iclog->ic_sema is left in a decremented state so let's add an up(). Prior to this patch, xfs/438 would fail consistently when running with an external log device: sync -> xfs_log_force -> xlog_write_iclog -> down(&iclog->ic_sema) -> blkdev_issue_flush (fail causes us to intiate shutdown) -> xlog_force_shutdown -> return unmount -> xfs_log_umount -> xlog_wait_iclog_completion -> down(&iclog->ic_sema) --------> HANG There is a second early return / shutdown. Make sure the up() happens for it as well. Also make sure we cleanup the iclog state, xlog_state_done_syncing, before dropping the iclog lock. Fixes: b5d721eaae47 ("xfs: external logs need to flush data device") Fixes: 842a42d126b4 ("xfs: shutdown on failure to add page to log bio") Fixes: 7d839e325af2 ("xfs: check return codes when flushing block devices") Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
|
#
74ad4693 |
|
11-Sep-2023 |
Darrick J. Wong <djwong@kernel.org> |
xfs: fix log recovery when unknown rocompat bits are set Log recovery has always run on read only mounts, even where the primary superblock advertises unknown rocompat bits. Due to a misunderstanding between Eric and Darrick back in 2018, we accidentally changed the superblock write verifier to shutdown the fs over that exact scenario. As a result, the log cleaning that occurs at the end of the mounting process fails if there are unknown rocompat bits set. As we now allow writing of the superblock if there are unknown rocompat bits set on a RO mount, we no longer want to turn off RO state to allow log recovery to succeed on a RO mount. Hence we also remove all the (now unnecessary) RO state toggling from the log recovery path. Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier" Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
f1e1765a |
|
28-Jun-2023 |
Dave Chinner <dchinner@redhat.com> |
xfs: journal geometry is not properly bounds checked If the journal geometry results in a sector or log stripe unit validation problem, it indicates that we cannot set the log up to safely write to the the journal. In these cases, we must abort the mount because the corruption needs external intervention to resolve. Similarly, a journal that is too large cannot be written to safely, either, so we shouldn't allow those geometries to mount, either. If the log is too small, we risk having transaction reservations overruning the available log space and the system hanging waiting for space it can never provide. This is purely a runtime hang issue, not a corruption issue as per the first cases listed above. We abort mounts of the log is too small for V5 filesystems, but we must allow v4 filesystems to mount because, historically, there was no log size validity checking and so some systems may still be out there with undersized logs. The problem is that on V4 filesystems, when we discover a log geometry problem, we skip all the remaining checks and then allow the log to continue mounting. This mean that if one of the log size checks fails, we skip the log stripe unit check. i.e. we allow the mount because a "non-fatal" geometry is violated, and then fail to check the hard fail geometries that should fail the mount. Move all these fatal checks to the superblock verifier, and add a new check for the two log sector size geometry variables having the same values. This will prevent any attempt to mount a log that has invalid or inconsistent geometries long before we attempt to mount the log. However, for the minimum log size checks, we can only do that once we've setup up the log and calculated all the iclog sizes and roundoffs. Hence this needs to remain in the log mount code after the log has been initialised. It is also the only case where we should allow a v4 filesystem to continue running, so leave that handling in place, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
1eb52a6a |
|
30-Nov-2022 |
Guo Xuenan <guoxuenan@huawei.com> |
xfs: wait iclog complete before tearing down AIL Fix uaf in xfs_trans_ail_delete during xlog force shutdown. In commit cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") changed the order of running callbacks and wait for iclog completion to avoid unmount path untimely destroy AIL. But which seems not enough to ensue this, adding mdelay in `xfs_buf_item_unpin` can prove that. The reproduction is as follows. To ensure destroy AIL safely, we should wait all xlog ioend workers done and sync the AIL. ================================================================== BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W 6.1.0-rc1-00002-gc28266863c4a #137 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-log/sda xlog_ioend_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xfs_trans_ail_delete+0x240/0x2a0 xfs_buf_item_done+0x7b/0xa0 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x4c8/0x860 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 9606: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7a/0x90 __kmalloc+0x59/0x140 kmem_alloc+0xb2/0x2f0 xfs_trans_ail_init+0x20/0x320 xfs_log_mount+0x37e/0x690 xfs_mountfs+0xe36/0x1b40 xfs_fs_fill_super+0xc5c/0x1a70 get_tree_bdev+0x3c5/0x6c0 vfs_get_tree+0x85/0x250 path_mount+0xec3/0x1830 do_mount+0xef/0x110 __x64_sys_mount+0x150/0x1f0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9662: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 __kmem_cache_free+0x99/0x2d0 kvfree+0x3a/0x40 xfs_log_unmount+0x60/0xf0 xfs_unmountfs+0xf3/0x1d0 xfs_fs_put_super+0x78/0x300 generic_shutdown_super+0x151/0x400 kill_block_super+0x9a/0xe0 deactivate_locked_super+0x82/0xe0 deactivate_super+0x91/0xb0 cleanup_mnt+0x32a/0x4a0 task_work_run+0x15f/0x240 exit_to_user_mode_prepare+0x188/0x190 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888023169400 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffff888023169400, ffff888023169480) The buggy address belongs to the physical page: page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888023168f80 pfn:0x23168 head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: cd6f79d1fb32 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
64c80dfd |
|
16-Nov-2022 |
Lukas Herbolt <lukas@herbolt.com> |
xfs: Print XFS UUID on mount and umount events. As of now only device names are printed out over __xfs_printk(). The device names are not persistent across reboots which in case of searching for origin of corruption brings another task to properly identify the devices. This patch add XFS UUID upon every mount/umount event which will make the identification much easier. Signed-off-by: Lukas Herbolt <lukas@herbolt.com> [sandeen: rebase onto current upstream kernel] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
a251c17a |
|
05-Oct-2022 |
Jason A. Donenfeld <Jason@zx2c4.com> |
treewide: use get_random_u32() when possible The prandom_u32() function has been a deprecated inline wrapper around get_random_u32() for several releases now, and compiles down to the exact same code. Replace the deprecated wrapper with a direct call to the real function. The same also applies to get_random_int(), which is just a wrapper around get_random_u32(). This was done as a basic find and replace. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Acked-by: Helge Deller <deller@gmx.de> # for parisc Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
#
56171040 |
|
18-Sep-2022 |
Zeng Heng <zengheng4@huawei.com> |
xfs: remove redundant else for clean code "else" is not generally useful after a return, so remove it for clean code. There is no logical changes. Signed-off-by: Zeng Heng <zengheng4@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
7d839e32 |
|
28-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: check return codes when flushing block devices If a blkdev_issue_flush fails, fsync needs to report that to upper levels. Modify xfs_file_fsync to capture the errors, while trying to flush as much data and log updates to disk as possible. If log writes cannot flush the data device, we need to shut down the log immediately because we've violated a log invariant. Modify this code to check the return value of blkdev_issue_flush as well. This behavior seems to go back to about 2.6.15 or so, which makes this fixes tag a bit misleading. Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187 Fixes: b5071ada510a ("xfs: remove xfs_blkdev_issue_flush") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
5e9466a5 |
|
31-Jul-2022 |
Xie Shaowen <studentxswpy@163.com> |
xfs: delete extra space and tab in blank line delete extra space and tab in blank line, there is no functional change. Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: Xie Shaowen <studentxswpy@163.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
d9f68777 |
|
07-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: xlog_sync() manually adjusts grant head space When xlog_sync() rounds off the tail the iclog that is being flushed, it manually subtracts that space from the grant heads. This space is actually reserved by the transaction ticket that covers the xlog_sync() call from xlog_write(), but we don't plumb the ticket down far enough for it to account for the space consumed in the current log ticket. The grant heads are hot, so we really should be accounting this to the ticket is we can, rather than adding thousands of extra grant head updates every CIL commit. Interestingly, this actually indicates a potential log space overrun can occur when we force the log. By the time that xfs_log_force() pushes out an active iclog and consumes the roundoff space, the reservation for that roundoff space has been returned to the grant heads and is no longer covered by a reservation. In theory the roundoff added to log force on an already full log could push the write head past the tail. In practice, the CIL commit that writes to the log and needs the iclog pushed will have reserved space for roundoff, so when it releases the ticket there will still be physical space for the roundoff to be committed to the log, even though it is no longer reserved. This roundoff won't be enough space to allow a transaction to be woken if the log is full, so overruns should not actually occur in practice. That said, it indicates that we should not release the CIL context log ticket until after we've released the commit iclog. It also means that xlog_sync() still needs the direct grant head manipulation if we don't provide it with a ticket. Log forces are rare when we are in fast paths running 1.5 million transactions/s that make the grant heads hot, so let's optimise the hot case and pass CIL log tickets down to the xlog_sync() code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
16924853 |
|
07-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert log vector chain to use list heads Because the next change is going to require sorting log vectors, and that requires arbitrary rearrangement of the list which cannot be done easily with a single linked list. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
31151cc3 |
|
01-Jul-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: rework per-iclog header CIL reservation For every iclog that a CIL push will use up, we need to ensure we have space reserved for the iclog header in each iclog. It is extremely difficult to do this accurately with a per-cpu counter without expensive summing of the counter in every commit. However, we know what the maximum CIL size is going to be because of the hard space limit we have, and hence we know exactly how many iclogs we are going to need to write out the CIL. We are constrained by the requirement that small transactions only have reservation space for a single iclog header built into them. At commit time we don't know how much of the current transaction reservation is made up of iclog header reservations as calculated by xfs_log_calc_unit_res() when the ticket was reserved. As larger reservations have multiple header spaces reserved, we can steal more than one iclog header reservation at a time, but we only steal the exact number needed for the given log vector size delta. As a result, we don't know exactly when we are going to steal iclog header reservations, nor do we know exactly how many we are going to need for a given CIL. To make things simple, start by calculating the worst case number of iclog headers a full CIL push will require. Record this into an atomic variable in the CIL. Then add a byte counter to the log ticket that records exactly how much iclog header space has been reserved in this ticket by xfs_log_calc_unit_res(). This tells us exactly how much space we can steal from the ticket at transaction commit time. Now, at transaction commit time, we can check if the CIL has a full iclog header reservation and, if not, steal the entire reservation the current ticket holds for iclog headers. This minimises the number of times we need to do atomic operations in the fast path, but still guarantees we get all the reservations we need. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
|
#
7561cea5 |
|
01-Jul-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: prevent a UAF when log IO errors race with unmount KASAN reported the following use after free bug when running generic/475: XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) XFS (dm-0): Ending recovery (logdev: internal) Buffer I/O error on dev dm-0, logical block 20639616, async page read Buffer I/O error on dev dm-0, logical block 20639617, async page read XFS (dm-0): log I/O error -5 XFS (dm-0): Filesystem has been shut down due to log error (0x2). XFS (dm-0): Unmounting Filesystem XFS (dm-0): Please unmount the filesystem and rectify the problem(s). ================================================================== BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270 Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136 CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs] Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2b8/0x661 ? do_raw_spin_lock+0x246/0x270 kasan_report+0xab/0x120 ? do_raw_spin_lock+0x246/0x270 do_raw_spin_lock+0x246/0x270 ? rwlock_bug.part.0+0x90/0x90 xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] process_one_work+0x672/0x1040 worker_thread+0x59b/0xec0 ? __kthread_parkme+0xc6/0x1f0 ? process_one_work+0x1040/0x1040 ? process_one_work+0x1040/0x1040 kthread+0x29e/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 154099: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0x8d/0x2e0 [xfs] xlog_cil_init+0x1f/0x540 [xfs] xlog_alloc_log+0xd1e/0x1260 [xfs] xfs_log_mount+0xba/0x640 [xfs] xfs_mountfs+0xf2b/0x1d00 [xfs] xfs_fs_fill_super+0x10af/0x1910 [xfs] get_tree_bdev+0x383/0x670 vfs_get_tree+0x7d/0x240 path_mount+0xdb7/0x1890 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 154151: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x110/0x190 slab_free_freelist_hook+0xab/0x180 kfree+0xbc/0x310 xlog_dealloc_log+0x1b/0x2b0 [xfs] xfs_unmountfs+0x119/0x200 [xfs] xfs_fs_put_super+0x6e/0x2e0 [xfs] generic_shutdown_super+0x12b/0x3a0 kill_block_super+0x95/0xd0 deactivate_locked_super+0x80/0x130 cleanup_mnt+0x329/0x4d0 task_work_run+0xc5/0x160 exit_to_user_mode_prepare+0xd4/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This appears to be a race between the unmount process, which frees the CIL and waits for in-flight iclog IO; and the iclog IO completion. When generic/475 runs, it starts fsstress in the background, waits a few seconds, and substitutes a dm-error device to simulate a disk falling out of a machine. If the fsstress encounters EIO on a pure data write, it will exit but the filesystem will still be online. The next thing the test does is unmount the filesystem, which tries to clean the log, free the CIL, and wait for iclog IO completion. If an iclog was being written when the dm-error switch occurred, it can race with log unmounting as follows: Thread 1 Thread 2 xfs_log_unmount xfs_log_clean xfs_log_quiesce xlog_ioend_work <observe error> xlog_force_shutdown test_and_set_bit(XLOG_IOERROR) xfs_log_force <log is shut down, nop> xfs_log_umount_write <log is shut down, nop> xlog_dealloc_log xlog_cil_destroy <wait for iclogs> spin_lock(&log->l_cilp->xc_push_lock) <KABOOM> Therefore, free the CIL after waiting for the iclogs to complete. I /think/ this race has existed for quite a few years now, though I don't remember the ~2014 era logging code well enough to know if it was a real threat then or if the actual race was exposed only more recently. Fixes: ac983517ec59 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
d9c61ccb |
|
26-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: move xfs_attr_use_log_assist out of xfs_log.c The LARP patchset added an awkward coupling point between libxfs and what would be libxlog, if the XFS log were actually its own library. Move the code that enables logged xattr updates out of "lib"xlog and into xfs_xattr.c so that it no longer has to know about xlog_* functions. While we're at it, give xfs_xattr.c its own header file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
202865cc |
|
26-May-2022 |
Darrick J. Wong <djwong@kernel.org> |
xfs: warn about LARP once per mount Since LARP is an experimental debug-only feature, we should try to warn about it being in use once per mount, not once per reboot. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f3f36c89 |
|
11-May-2022 |
Allison Henderson <allison.henderson@oracle.com> |
xfs: Add xfs_attr_set_deferred and xfs_attr_remove_deferred These routines set up and queue a new deferred attribute operations. These functions are meant to be called by any routine needing to initiate a deferred attribute operation as opposed to the existing inline operations. New helper function xfs_attr_item_init also added. Finally enable delayed attributes in xfs_attr_set and xfs_attr_remove. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2eb7550d |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert shutdown reasons to unsigned. 5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
14b07ecd |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: xlog_write() doesn't need optype anymore So remove it from the interface and callers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
be8ddda5 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: xlog_write() no longer needs contwr state The rework of xlog_write() no longer requires xlog_get_iclog_state() to tell it about internal iclog space reservation state to direct it on what to do. Remove this parameter. $ size fs/xfs/xfs_log.o.* text data bss dec hex filename 26520 560 8 27088 69d0 fs/xfs/xfs_log.o.orig 26384 560 8 26952 6948 fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
1236bbe8 |
|
20-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xlog_verify_dest_ptr Just check that the offset in xlog_write_vec is smaller than the iclog size and remove the expensive cycling through all iclogs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ad3e3693 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xlog_write_partial() Re-implement writing of a log vector that does not fit into the current iclog. The iclog will already be in XLOG_STATE_WANT_SYNC because xlog_get_iclog_space() will have reserved all the remaining iclog space for us, hence we can simply iterate over the iovecs in the log vector getting more iclog space until the entire log vector is written. Handling this partial write case separately means we do need to pass unnecessary state around for the common, fast path case when the log vector fits entirely within the current iclog. It isolates the complexity and allows us to modify and improve the partial write case without impacting the simple fast path. This change includes several improvements incorporated from patches written by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
db357078 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xlog_write_full() Introduce an optimised version of xlog_write() that is used when the entire write will fit in a single iclog. This greatly simplifies the implementation of writing a log vector chain into an iclog, and sets the ground work for a much more understandable xlog_write() implementation. This incorporates some factoring and simplifications proposed by Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
decb545f |
|
20-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
xfs: change the type of ic_datap Turn ic_datap from a char into a void pointer given that it points to arbitrary data. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [dgc: also remove (char *) cast in xlog_alloc_log()] Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
d80fc291 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: pass lv chain length into xlog_write() The caller of xlog_write() usually has a close accounting of the aggregated vector length contained in the log vector chain passed to xlog_write(). There is no need to iterate the chain to calculate he length of the data in xlog_write_calculate_len() if the caller is already iterating that chain to build it. Passing in the vector length avoids doing an extra chain iteration, which can be a significant amount of work given that large CIL commits can have hundreds of thousands of vectors attached to the chain. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c5141320 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: log ticket region debug is largely useless xlog_tic_add_region() is used to trace the regions being added to a log ticket to provide information in the situation where a ticket reservation overrun occurs. The information gathered is stored int the ticket, and dumped if xlog_print_tic_res() is called. For a front end struct xfs_trans overrun, the ticket only contains reservation tracking information - the ticket is never handed to the log so has no regions attached to it. The overrun debug information in this case comes from xlog_print_trans(), which walks the items attached to the transaction and dumps their attached formatted log vectors directly. It also dumps the ticket state, but that only contains reservation accounting and nothing else. Hence xlog_print_tic_res() never dumps region or overrun information from this path. xlog_tic_add_region() is actually called from xlog_write(), which means it is being used to track the regions seen in a CIL checkpoint log vector chain. In looking at CIL behaviour recently, I've seen 32MB checkpoints regularly exceed 250,000 regions in the LV chain. The log ticket debug code can track *15* regions. IOWs, if there is a ticket overrun in the CIL code, the ticket region tracking code is going to be completely useless for determining what went wrong. The only thing it can tell us is how much of an overrun occurred, and we really don't need extra debug information in the log ticket to tell us that. Indeed, the main place we call xlog_tic_add_region() is also adding up the number of regions and the space used so that xlog_write() knows how much will be written to the log. This is exactly the same information that log ticket is storing once we take away the useless region tracking array. Hence xlog_tic_add_region() is not useful, but can be called 250,000 times a CIL push... Just strip all that debug "information" out of the of the log ticket and only have it report reservation space information when an overrun occurs. This also reduces the size of a log ticket down by about 150 bytes... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
8d547cf9 |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: reserve space and initialise xlog_op_header in item formatting Current xlog_write() adds op headers to the log manually for every log item region that is in the vector passed to it. While xlog_write() needs to stamp the transaction ID into the ophdr, we already know it's length, flags, clientid, etc at CIL commit time. This means the only time that xlog write really needs to format and reserve space for a new ophdr is when a region is split across two iclogs. Adding the opheader and accounting for it as part of the normal formatted item region means we simplify the accounting of space used by a transaction and we don't have to special case reserving of space in for the ophdrs in xlog_write(). It also means we can largely initialise the ophdr in transaction commit instead of xlog_write, making the xlog_write formatting inner loop much tighter. xlog_prepare_iovec() is now too large to stay as an inline function, so we move it out of line and into xfs_log.c. Object sizes: text data bss dec hex filename 1125934 305951 484 1432369 15db31 fs/xfs/built-in.a.before 1123360 305951 484 1429795 15d123 fs/xfs/built-in.a.after So the code is a roughly 2.5kB smaller with xlog_prepare_iovec() now out of line, even though it grew in size itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c7610dce |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: log tickets don't need log client id We currently set the log ticket client ID when we reserve a transaction. This client ID is only ever written to the log by a CIL checkpoint or unmount records, and so anything using a high level transaction allocated through xfs_trans_alloc() does not need a log ticket client ID to be set. For the CIL checkpoint, the client ID written to the journal is always XFS_TRANSACTION, and for the unmount record it is always XFS_LOG, and nothing else writes to the log. All of these operations tell xlog_write() exactly what they need to write to the log (the optype) and build their own opheaders for start, commit and unmount records. Hence we no longer need to set the client id in either the log ticket or the xfs_trans. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
54021b62 |
|
20-Apr-2022 |
Dave Chinner <david@fromorbit.com> |
xfs: embed the xlog_op_header in the commit record Remove the final case where xlog_write() has to prepend an opheader to a log transaction. Similar to the start record, the commit record is just an empty opheader with a XLOG_COMMIT_TRANS type, so we can just make this the payload for the region being passed to xlog_write() and remove the special handling in xlog_write() for the commit record. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ffa04c1f |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: embed the xlog_op_header in the unmount record Remove another case where xlog_write() has to prepend an opheader to a log transaction. The unmount record + ophdr is smaller than the minimum amount of space guaranteed to be free in an iclog (2 * sizeof(ophdr)) and so we don't have to care about an unmount record being split across 2 iclogs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
6eaed95e |
|
20-Apr-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: only CIL pushes require a start record So move the one-off start record writing in xlog_write() out into the static header that the CIL push builds to write into the log initially. This simplifes the xlog_write() logic a lot. pahole on x86-64 confirms that the xlog_cil_trans_hdr is correctly 32 bit aligned and packed for copying the log op and transaction headers directly into the log as a single log region copy. struct xlog_cil_trans_hdr { struct xlog_op_header oph[2]; /* 0 24 */ struct xfs_trans_header thdr; /* 24 16 */ struct xfs_log_iovec lhdr[2]; /* 40 32 */ /* size: 72, cachelines: 2, members: 3 */ /* last cacheline: 8 bytes */ }; A wart is needed to handle the fact that length of the region the opheader points to doesn't include the opheader length. hence if we embed the opheader, we have to substract the opheader length from the length written into the opheader by the generic copying code. This will eventually go away when everything is converted to embedded opheaders. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
919edbad |
|
29-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: drop async cache flushes from CIL commits. Jan Kara reported a performance regression in dbench that he bisected down to commit bad77c375e8d ("xfs: CIL checkpoint flushes caches unconditionally"). Whilst developing the journal flush/fua optimisations this cache was part of, it appeared to made a significant difference to performance. However, now that this patchset has settled and all the correctness issues fixed, there does not appear to be any significant performance benefit to asynchronous cache flushes. In fact, the opposite is true on some storage types and workloads, where additional cache flushes that can occur from fsync heavy workloads have measurable and significant impact on overall throughput. Local dbench testing shows little difference on dbench runs with sync vs async cache flushes on either fast or slow SSD storage, and no difference in streaming concurrent async transaction workloads like fs-mark. Fast NVME storage. From `dbench -t 30`, CIL scale: clients async sync BW Latency BW Latency 1 935.18 0.855 915.64 0.903 8 2404.51 6.873 2341.77 6.511 16 3003.42 6.460 2931.57 6.529 32 3697.23 7.939 3596.28 7.894 128 7237.43 15.495 7217.74 11.588 512 5079.24 90.587 5167.08 95.822 fsmark, 32 threads, create w/ 64 byte xattr w/32k logbsize create chown unlink async 1m41s 1m16s 2m03s sync 1m40s 1m19s 1m54s Slower SATA SSD storage: From `dbench -t 30`, CIL scale: clients async sync BW Latency BW Latency 1 78.59 15.792 83.78 10.729 8 367.88 92.067 404.63 59.943 16 564.51 72.524 602.71 76.089 32 831.66 105.984 870.26 110.482 128 1659.76 102.969 1624.73 91.356 512 2135.91 223.054 2603.07 161.160 fsmark, 16 threads, create w/32k logbsize create unlink async 5m06s 4m15s sync 5m00s 4m22s And on Jan's test machine: 5.18-rc8-vanilla 5.18-rc8-patched Amean 1 71.22 ( 0.00%) 64.94 * 8.81%* Amean 2 93.03 ( 0.00%) 84.80 * 8.85%* Amean 4 150.54 ( 0.00%) 137.51 * 8.66%* Amean 8 252.53 ( 0.00%) 242.24 * 4.08%* Amean 16 454.13 ( 0.00%) 439.08 * 3.31%* Amean 32 835.24 ( 0.00%) 829.74 * 0.66%* Amean 64 1740.59 ( 0.00%) 1686.73 * 3.09%* Performance and cache flush behaviour is restored to pre-regression levels. As such, we can now consider the async cache flush mechanism an unnecessary exercise in premature optimisation and hence we can now remove it and the infrastructure it requires completely. Fixes: bad77c375e8d ("xfs: CIL checkpoint flushes caches unconditionally") Reported-and-tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
5652ef31 |
|
29-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: shutdown during log recovery needs to mark the log shutdown When a checkpoint writeback is run by log recovery, corruption propagated from the log can result in writeback verifiers failing and calling xfs_force_shutdown() from xfs_buf_delwri_submit_buffers(). This results in the mount being marked as shutdown, but the log does not get marked as shut down because: /* * If this happens during log recovery then we aren't using the runtime * log mechanisms yet so there's nothing to shut down. */ if (!log || xlog_in_recovery(log)) return false; If there are other buffers that then fail (say due to detecting the mount shutdown), they will now hang in xfs_do_force_shutdown() waiting for the log to shut down like this: __schedule+0x30d/0x9e0 schedule+0x55/0xd0 xfs_do_force_shutdown+0x1cd/0x200 ? init_wait_var_entry+0x50/0x50 xfs_buf_ioend+0x47e/0x530 __xfs_buf_submit+0xb0/0x240 xfs_buf_delwri_submit_buffers+0xfe/0x270 xfs_buf_delwri_submit+0x3a/0xc0 xlog_do_recovery_pass+0x474/0x7b0 ? do_raw_spin_unlock+0x30/0xb0 xlog_do_log_recovery+0x91/0x140 xlog_do_recover+0x38/0x1e0 xlog_recover+0xdd/0x170 xfs_log_mount+0x17e/0x2e0 xfs_mountfs+0x457/0x930 xfs_fs_fill_super+0x476/0x830 xlog_force_shutdown() always needs to mark the log as shut down, regardless of whether recovery is in progress or not, so that multiple calls to xfs_force_shutdown() during recovery don't end up waiting for the log to be shut down like this. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
41e63621 |
|
29-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: xfs_do_force_shutdown needs to block racing shutdowns When we call xfs_forced_shutdown(), the caller often expects the filesystem to be completely shut down when it returns. However, if we have racing xfs_forced_shutdown() calls, the first caller sets the mount shutdown flag then goes to shutdown the log. The second caller sees the mount shutdown flag and returns immediately - it does not wait for the log to be shut down. Unfortunately, xfs_forced_shutdown() is used in some places that expect it to completely shut down the filesystem before it returns (e.g. xfs_trans_log_inode()). As such, returning before the log has been shut down leaves us in a place where the transaction failed to complete correctly but we still call xfs_trans_commit(). This situation arises because xfs_trans_log_inode() does not return an error and instead calls xfs_force_shutdown() to ensure that the transaction being committed is aborted. Unfortunately, we have a race condition where xfs_trans_commit() needs to check xlog_is_shutdown() because it can't abort log items before the log is shut down, but it needs to use xfs_is_shutdown() because xfs_forced_shutdown() does not block waiting for the log to shut down. To fix this conundrum, first we make all calls to xfs_forced_shutdown() block until the log is also shut down. This means we can then safely use xfs_forced_shutdown() as a mechanism that ensures the currently running transaction will be aborted by xfs_trans_commit() regardless of the shutdown check it uses. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b5f17bec |
|
29-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: log shutdown triggers should only shut down the log We've got a mess on our hands. 1. xfs_trans_commit() cannot cancel transactions because the mount is shut down - that causes dirty, aborted, unlogged log items to sit unpinned in memory and potentially get written to disk before the log is shut down. Hence xfs_trans_commit() can only abort transactions when xlog_is_shutdown() is true. 2. xfs_force_shutdown() is used in places to cause the current modification to be aborted via xfs_trans_commit() because it may be impractical or impossible to cancel the transaction directly, and hence xfs_trans_commit() must cancel transactions when xfs_is_shutdown() is true in this situation. But we can't do that because of #1. 3. Log IO errors cause log shutdowns by calling xfs_force_shutdown() to shut down the mount and then the log from log IO completion. 4. xfs_force_shutdown() can result in a log force being issued, which has to wait for log IO completion before it will mark the log as shut down. If #3 races with some other shutdown trigger that runs a log force, we rely on xfs_force_shutdown() silently ignoring #3 and avoiding shutting down the log until the failed log force completes. 5. To ensure #2 always works, we have to ensure that xfs_force_shutdown() does not return until the the log is shut down. But in the case of #4, this will result in a deadlock because the log Io completion will block waiting for a log force to complete which is blocked waiting for log IO to complete.... So the very first thing we have to do here to untangle this mess is dissociate log shutdown triggers from mount shutdowns. We already have xlog_forced_shutdown, which will atomically transistion to the log a shutdown state. Due to internal asserts it cannot be called multiple times, but was done simply because the only place that could call it was xfs_do_force_shutdown() (i.e. the mount shutdown!) and that could only call it once and once only. So the first thing we do is remove the asserts. We then convert all the internal log shutdown triggers to call xlog_force_shutdown() directly instead of xfs_force_shutdown(). This allows the log shutdown triggers to shut down the log without needing to care about mount based shutdown constraints. This means we shut down the log independently of the mount and the mount may not notice this until it's next attempt to read or modify metadata. At that point (e.g. xfs_trans_commit()) it will see that the log is shutdown, error out and shutdown the mount. To ensure that all the unmount behaviours and asserts track correctly as a result of a log shutdown, propagate the shutdown up to the mount if it is not already set. This keeps the mount and log state in sync, and saves a huge amount of hassle where code fails because of a log shutdown but only checks for mount shutdowns and hence ends up doing the wrong thing. Cleaning up that mess is an exercise for another day. This enables us to address the other problems noted above in followup patches. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
cd6f79d1 |
|
29-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks Brian reported a null pointer dereference failure during unmount in xfs/006. He tracked the problem down to the AIL being torn down before a log shutdown had completed and removed all the items from the AIL. The failure occurred in this path while unmount was proceeding in another task: xfs_trans_ail_delete+0x102/0x130 [xfs] xfs_buf_item_done+0x22/0x30 [xfs] xfs_buf_ioend+0x73/0x4d0 [xfs] xfs_trans_committed_bulk+0x17e/0x2f0 [xfs] xlog_cil_committed+0x2a9/0x300 [xfs] xlog_cil_process_committed+0x69/0x80 [xfs] xlog_state_shutdown_callbacks+0xce/0xf0 [xfs] xlog_force_shutdown+0xdf/0x150 [xfs] xfs_do_force_shutdown+0x5f/0x150 [xfs] xlog_ioend_work+0x71/0x80 [xfs] process_one_work+0x1c5/0x390 worker_thread+0x30/0x350 kthread+0xd7/0x100 ret_from_fork+0x1f/0x30 This is processing an EIO error to a log write, and it's triggering a force shutdown. This causes the log to be shut down, and then it is running attached iclog callbacks from the shutdown context. That means the fs and log has already been marked as xfs_is_shutdown/xlog_is_shutdown and so high level code will abort (e.g. xfs_trans_commit(), xfs_log_force(), etc) with an error because of shutdown. The umount would have been blocked waiting for a log force completion inside xfs_log_cover() -> xfs_sync_sb(). The first thing for this situation to occur is for xfs_sync_sb() to exit without waiting for the iclog buffer to be comitted to disk. The above trace is the completion routine for the iclog buffer, and it is shutting down the filesystem. xlog_state_shutdown_callbacks() does this: { struct xlog_in_core *iclog; LIST_HEAD(cb_list); spin_lock(&log->l_icloglock); iclog = log->l_iclog; do { if (atomic_read(&iclog->ic_refcnt)) { /* Reference holder will re-run iclog callbacks. */ continue; } list_splice_init(&iclog->ic_callbacks, &cb_list); >>>>>> wake_up_all(&iclog->ic_write_wait); >>>>>> wake_up_all(&iclog->ic_force_wait); } while ((iclog = iclog->ic_next) != log->l_iclog); wake_up_all(&log->l_flush_wait); spin_unlock(&log->l_icloglock); >>>>>> xlog_cil_process_committed(&cb_list); } This wakes any thread waiting on IO completion of the iclog (in this case the umount log force) before shutdown processes all the pending callbacks. That means the xfs_sync_sb() waiting on a sync transaction in xfs_log_force() on iclog->ic_force_wait will get woken before the callbacks attached to that iclog are run. This results in xfs_sync_sb() returning an error, and so unmount unblocks and continues to run whilst the log shutdown is still in progress. Normally this is just fine because the force waiter has nothing to do with AIL operations. But in the case of this unmount path, the log force waiter goes on to tear down the AIL because the log is now shut down and so nothing ever blocks it again from the wait point in xfs_log_cover(). Hence it's a race to see who gets to the AIL first - the unmount code or xlog_cil_process_committed() killing the superblock buffer. To fix this, we just have to change the order of processing in xlog_state_shutdown_callbacks() to run the callbacks before it wakes any task waiting on completion of the iclog. Reported-by: Brian Foster <bfoster@redhat.com> Fixes: aad7272a9208 ("xfs: separate out log shutdown callback processing") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
d86142dd |
|
17-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: log items should have a xlog pointer, not a mount Log items belong to the log, not the xfs_mount. Convert the mount pointer in the log item to a xlog pointer in preparation for upcoming log centric changes to the log items. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
a9a4bc8c |
|
17-Mar-2022 |
Dave Chinner <dchinner@redhat.com> |
xfs: log worker needs to start before intent/unlink recovery After 963 iterations of generic/530, it deadlocked during recovery on a pinned inode cluster buffer like so: XFS (pmem1): Starting recovery (logdev: internal) INFO: task kworker/8:0:306037 blocked for more than 122 seconds. Not tainted 5.17.0-rc6-dgc+ #975 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/8:0 state:D stack:13024 pid:306037 ppid: 2 flags:0x00004000 Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker Call Trace: <TASK> __schedule+0x30d/0x9e0 schedule+0x55/0xd0 schedule_timeout+0x114/0x160 __down+0x99/0xf0 down+0x5e/0x70 xfs_buf_lock+0x36/0xf0 xfs_buf_find+0x418/0x850 xfs_buf_get_map+0x47/0x380 xfs_buf_read_map+0x54/0x240 xfs_trans_read_buf_map+0x1bd/0x490 xfs_imap_to_bp+0x4f/0x70 xfs_iunlink_map_ino+0x66/0xd0 xfs_iunlink_map_prev.constprop.0+0x148/0x2f0 xfs_iunlink_remove_inode+0xf2/0x1d0 xfs_inactive_ifree+0x1a3/0x900 xfs_inode_unlink+0xcc/0x210 xfs_inodegc_worker+0x1ac/0x2f0 process_one_work+0x1ac/0x390 worker_thread+0x56/0x3c0 kthread+0xf6/0x120 ret_from_fork+0x1f/0x30 </TASK> task:mount state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000 Call Trace: <TASK> __schedule+0x30d/0x9e0 schedule+0x55/0xd0 schedule_timeout+0x114/0x160 __down+0x99/0xf0 down+0x5e/0x70 xfs_buf_lock+0x36/0xf0 xfs_buf_find+0x418/0x850 xfs_buf_get_map+0x47/0x380 xfs_buf_read_map+0x54/0x240 xfs_trans_read_buf_map+0x1bd/0x490 xfs_imap_to_bp+0x4f/0x70 xfs_iget+0x300/0xb40 xlog_recover_process_one_iunlink+0x4c/0x170 xlog_recover_process_iunlinks.isra.0+0xee/0x130 xlog_recover_finish+0x57/0x110 xfs_log_mount_finish+0xfc/0x1e0 xfs_mountfs+0x540/0x910 xfs_fs_fill_super+0x495/0x850 get_tree_bdev+0x171/0x270 xfs_fs_get_tree+0x15/0x20 vfs_get_tree+0x24/0xc0 path_mount+0x304/0xba0 __x64_sys_mount+0x108/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> task:xfsaild/pmem1 state:D stack:14544 pid:324525 ppid: 2 flags:0x00004000 Call Trace: <TASK> __schedule+0x30d/0x9e0 schedule+0x55/0xd0 io_schedule+0x4b/0x80 xfs_buf_wait_unpin+0x9e/0xf0 __xfs_buf_submit+0x14a/0x230 xfs_buf_delwri_submit_buffers+0x107/0x280 xfs_buf_delwri_submit_nowait+0x10/0x20 xfsaild+0x27e/0x9d0 kthread+0xf6/0x120 ret_from_fork+0x1f/0x30 We have the mount process waiting on an inode cluster buffer read, inodegc doing unlink waiting on the same inode cluster buffer, and the AIL push thread blocked in writeback waiting for the inode cluster buffer to become unpinned. What has happened here is that the AIL push thread has raced with the inodegc process modifying, committing and pinning the inode cluster buffer here in xfs_buf_delwri_submit_buffers() here: blk_start_plug(&plug); list_for_each_entry_safe(bp, n, buffer_list, b_list) { if (!wait_list) { if (xfs_buf_ispinned(bp)) { pinned++; continue; } Here >>>>>> if (!xfs_buf_trylock(bp)) continue; Basically, the AIL has found the buffer wasn't pinned and got the lock without blocking, but then the buffer was pinned. This implies the processing here was pre-empted between the pin check and the lock, because the pin count can only be increased while holding the buffer locked. Hence when it has gone to submit the IO, it has blocked waiting for the buffer to be unpinned. With all executing threads now waiting on the buffer to be unpinned, we normally get out of situations like this via the background log worker issuing a log force which will unpinned stuck buffers like this. But at this point in recovery, we haven't started the log worker. In fact, the first thing we do after processing intents and unlinked inodes is *start the log worker*. IOWs, we start it too late to have it break deadlocks like this. Avoid this and any other similar deadlock vectors in intent and unlinked inode recovery by starting the log worker before we recover intents and unlinked inodes. This part of recovery runs as though the filesystem is fully active, so we really should have the same infrastructure running as we normally do at runtime. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
49add496 |
|
24-Jan-2022 |
Christoph Hellwig <hch@lst.de> |
block: pass a block_device and opf to bio_init Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
182696fb |
|
12-Oct-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: rename _zone variables to _cache Now that we've gotten rid of the kmem_zone_t typedef, rename the variables to _cache since that's what they are. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
e7720afa |
|
27-Sep-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: remove kmem_zone typedef Remove these typedefs by referencing kmem_cache directly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
|
#
2e973b2c |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert remaining mount flags to state flags The remaining mount flags kept in m_flags are actually runtime state flags. These change dynamically, so they really should be updated atomically so we don't potentially lose an update due to racing modifications. Convert these remaining flags to be stored in m_opstate and use atomic bitops to set and clear the flags. This also adds a couple of simple wrappers for common state checks - read only and shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
0560f31a |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert mount flags to features Replace m_flags feature checks with xfs_has_<feature>() calls and rework the setup code to set flags in m_features. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
38c26bfd |
|
18-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: replace xfs_sb_version checks with feature flag checks Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
0020a190 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: AIL needs asynchronous CIL forcing The AIL pushing is stalling on log forces when it comes across pinned items. This is happening on removal workloads where the AIL is dominated by stale items that are removed from AIL when the checkpoint that marks the items stale is committed to the journal. This results is relatively few items in the AIL, but those that are are often pinned as directories items are being removed from are still being logged. As a result, many push cycles through the CIL will first issue a blocking log force to unpin the items. This can take some time to complete, with tracing regularly showing push delays of half a second and sometimes up into the range of several seconds. Sequences like this aren't uncommon: .... 399.829437: xfsaild: last lsn 0x11002dd000 count 101 stuck 101 flushing 0 tout 20 <wanted 20ms, got 270ms delay> 400.099622: xfsaild: target 0x11002f3600, prev 0x11002f3600, last lsn 0x0 400.099623: xfsaild: first lsn 0x11002f3600 400.099679: xfsaild: last lsn 0x1100305000 count 16 stuck 11 flushing 0 tout 50 <wanted 50ms, got 500ms delay> 400.589348: xfsaild: target 0x110032e600, prev 0x11002f3600, last lsn 0x0 400.589349: xfsaild: first lsn 0x1100305000 400.589595: xfsaild: last lsn 0x110032e600 count 156 stuck 101 flushing 30 tout 50 <wanted 50ms, got 460ms delay> 400.950341: xfsaild: target 0x1100353000, prev 0x110032e600, last lsn 0x0 400.950343: xfsaild: first lsn 0x1100317c00 400.950436: xfsaild: last lsn 0x110033d200 count 105 stuck 101 flushing 0 tout 20 <wanted 20ms, got 200ms delay> 401.142333: xfsaild: target 0x1100361600, prev 0x1100353000, last lsn 0x0 401.142334: xfsaild: first lsn 0x110032e600 401.142535: xfsaild: last lsn 0x1100353000 count 122 stuck 101 flushing 8 tout 10 <wanted 10ms, got 10ms delay> 401.154323: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x1100353000 401.154328: xfsaild: first lsn 0x1100353000 401.154389: xfsaild: last lsn 0x1100353000 count 101 stuck 101 flushing 0 tout 20 <wanted 20ms, got 300ms delay> 401.451525: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0 401.451526: xfsaild: first lsn 0x1100353000 401.451804: xfsaild: last lsn 0x1100377200 count 170 stuck 22 flushing 122 tout 50 <wanted 50ms, got 500ms delay> 401.933581: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0 .... In each of these cases, every AIL pass saw 101 log items stuck on the AIL (pinned) with very few other items being found. Each pass, a log force was issued, and delay between last/first is the sleep time + the sync log force time. Some of these 101 items pinned the tail of the log. The tail of the log does slowly creep forward (first lsn), but the problem is that the log is actually out of reservation space because it's been running so many transactions that stale items that never reach the AIL but consume log space. Hence we have a largely empty AIL, with long term pins on items that pin the tail of the log that don't get pushed frequently enough to keep log space available. The problem is the hundreds of milliseconds that we block in the log force pushing the CIL out to disk. The AIL should not be stalled like this - it needs to run and flush items that are at the tail of the log with minimal latency. What we really need to do is trigger a log flush, but then not wait for it at all - we've already done our waiting for stuff to complete when we backed off prior to the log force being issued. Even if we remove the XFS_LOG_SYNC from the xfs_log_force() call, we still do a blocking flush of the CIL and that is what is causing the issue. Hence we need a new interface for the CIL to trigger an immediate background push of the CIL to get it moving faster but not to wait on that to occur. While the CIL is pushing, the AIL can also be pushing. We already have an internal interface to do this - xlog_cil_push_now() - but we need a wrapper for it to be used externally. xlog_cil_force_seq() can easily be extended to do what we need as it already implements the synchronous CIL push via xlog_cil_push_now(). Add the necessary flags and "push current sequence" semantics to xlog_cil_force_seq() and convert the AIL pushing to use it. One of the complexities here is that the CIL push does not guarantee that the commit record for the CIL checkpoint is written to disk. The current log force ensures this by submitting the current ACTIVE iclog that the commit record was written to. We need the CIL to actually write this commit record to disk for an async push to ensure that the checkpoint actually makes it to disk and unpins the pinned items in the checkpoint on completion. Hence we need to pass down to the CIL push that we are doing an async flush so that it can switch out the commit_iclog if necessary to get written to disk when the commit iclog is finally released. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
68a74dca |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: order CIL checkpoint start records Because log recovery depends on strictly ordered start records as well as strictly ordered commit records. This is a zero day bug in the way XFS writes pipelined transactions to the journal which is exposed by fixing the zero day bug that prevents the CIL from pipelining checkpoints. This re-introduces explicit concurrent commits back into the on-disk journal and hence out of order start records. The XFS journal commit code has never ordered start records and we have relied on strict commit record ordering for correct recovery ordering of concurrently written transactions. Unfortunately, root cause analysis uncovered the fact that log recovery uses the LSN of the start record for transaction commit processing. Hence, whilst the commits are processed in strict order by recovery, the LSNs associated with the commits can be out of order and so recovery may stamp incorrect LSNs into objects and/or misorder intents in the AIL for later processing. This can result in log recovery failures and/or on disk corruption, sometimes silent. Because this is a long standing log recovery issue, we can't just fix log recovery and call it good. This still leaves older kernels susceptible to recovery failures and corruption when replaying a log from a kernel that pipelines checkpoints. There is also the issue that in-memory ordering for AIL pushing and data integrity operations are based on checkpoint start LSNs, and if the start LSN is incorrect in the journal, it is also incorrect in memory. Hence there's really only one choice for fixing this zero-day bug: we need to strictly order checkpoint start records in ascending sequence order in the log, the same way we already strictly order commit records. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
caa80090 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: attach iclog callbacks in xlog_cil_set_ctx_write_state() Now that we have a mechanism to guarantee that the callbacks attached to an iclog are owned by the context that attaches them until they drop their reference to the iclog via xlog_state_release_iclog(), we can attach callbacks to the iclog at any time we have an active reference to the iclog. xlog_state_get_iclog_space() always guarantees that the commit record will fit in the iclog it returns, so we can move this IO callback setting to xlog_cil_set_ctx_write_state(), record the commit iclog in the context and remove the need for the commit iclog to be returned by xlog_write() altogether. This, in turn, allows us to move the wakeup for ordered commit record writes up into xlog_cil_set_ctx_write_state(), too, because we have been guaranteed that this commit record will be physically located in the iclog before any waiting commit record at a higher sequence number will be granted iclog space. This further cleans up the post commit record write processing in the CIL push code, especially as xlog_state_release_iclog() will now clean up the context when shutdown errors occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
c45aba40 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: pass a CIL context to xlog_write() Pass the CIL context to xlog_write() rather than a pointer to a LSN variable. Only the CIL checkpoint calls to xlog_write() need to know about the start LSN of the writes, so rework xlog_write to directly write the LSNs into the CIL context structure. This removes the commit_lsn variable from xlog_cil_push_work(), so now we only have to issue the commit record ordering wakeup from there. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2ce82b72 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: move xlog_commit_record to xfs_log_cil.c It is only used by the CIL checkpoints, and is the counterpart to start record formatting and writing that is already local to xfs_log_cil.c. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2562c322 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: log head and tail aren't reliable during shutdown I'm seeing assert failures from xlog_space_left() after a shutdown has begun that look like: XFS (dm-0): log I/O error -5 XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1338 of file fs/xfs/xfs_log.c. Return address = xlog_ioend_work+0x64/0xc0 XFS (dm-0): Log I/O Error Detected. XFS (dm-0): Shutting down filesystem. Please unmount the filesystem and rectify the problem(s) XFS (dm-0): xlog_space_left: head behind tail XFS (dm-0): tail_cycle = 6, tail_bytes = 2706944 XFS (dm-0): GH cycle = 6, GH bytes = 1633867 XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 1310 ------------[ cut here ]------------ Call Trace: xlog_space_left+0xc3/0x110 xlog_grant_push_threshold+0x3f/0xf0 xlog_grant_push_ail+0x12/0x40 xfs_log_reserve+0xd2/0x270 ? __might_sleep+0x4b/0x80 xfs_trans_reserve+0x18b/0x260 ..... There are two things here. Firstly, after a shutdown, the log head and tail can be out of whack as things abort and release (or don't release) resources, so checking them for sanity doesn't make much sense. Secondly, xfs_log_reserve() can race with shutdown and so it can still fail like this even though it has already checked for a log shutdown before calling xlog_grant_push_ail(). So, before ASSERT failing in xlog_space_left(), make sure we haven't already shut down.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
502a01fa |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: don't run shutdown callbacks on active iclogs When the log is shutdown, it currently walks all the iclogs and runs callbacks that are attached to the iclogs, regardless of whether the iclog is queued for IO completion or not. This creates a problem for contexts attaching callbacks to iclogs in that a racing shutdown can run the callbacks even before the attaching context has finished processing the iclog and releasing it for IO submission. If the callback processing of the iclog frees the structure that is attached to the iclog, then this leads to an UAF scenario that can only be protected against by holding the icloglock from the point callbacks are attached through to the release of the iclog. While we currently do this, it is not practical or sustainable. Hence we need to make shutdown processing the responsibility of the context that holds active references to the iclog. We know that the contexts attaching callbacks to the iclog must have active references to the iclog, and that means they must be in either ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over iclogs in these states -except- when the log is shut down. xlog_state_do_callback() checks the state of the iclogs while holding the icloglock, therefore the reference count/state change that occurs in xlog_state_release_iclog() after the callbacks are atomic w.r.t. shutdown processing. We can't push the responsibility of callback cleanup onto the CIL context because we can have ACTIVE iclogs that have callbacks attached that have already been released. Hence we really need to internalise the cleanup of callbacks into xlog_state_release_iclog() processing. Indeed, we already have that internalisation via: xlog_state_release_iclog drop last reference ->SYNCING xlog_sync xlog_write_iclog if (log_is_shutdown) xlog_state_done_syncing() xlog_state_do_callback() <process shutdown on iclog that is now in SYNCING state> The problem is that xlog_state_release_iclog() aborts before doing anything if the log is already shut down. It assumes that the callbacks have already been cleaned up, and it doesn't need to do any cleanup. Hence the fix is to remove the xlog_is_shutdown() check from xlog_state_release_iclog() so that reference counts are correctly released from the iclogs, and when the reference count is zero we always transition to SYNCING if the log is shut down. Hence we'll always enter the xlog_sync() path in a shutdown and eventually end up erroring out the iclog IO and running xlog_state_do_callback() to process the callbacks attached to the iclog. This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs directly in the shutdown code, and in doing so gets rid of the UAF vector that currently exists. This then decouples the adding of callbacks to the iclogs from xlog_state_release_iclog() as we guarantee that xlog_state_release_iclog() will process the callbacks if the log has been shut down before xlog_state_release_iclog() has been called. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
aad7272a |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: separate out log shutdown callback processing The iclog callback processing done during a forced log shutdown has different logic to normal runtime IO completion callback processing. Separate out the shutdown callbacks into their own function and call that from the shutdown code instead. We don't need this shutdown specific logic in the normal runtime completion code - we'll always run the shutdown version on shutdown, and it will do what shutdown needs regardless of whether there are racing IO completion callbacks scheduled or in progress. Hence we can also simplify the normal IO completion callpath and only abort if shutdown occurred while we actively were processing callbacks. Further, separating out the IO completion logic from the shutdown logic avoids callback race conditions from being triggered by log IO completion after a shutdown. IO completion will now only run callbacks on iclogs that are in the correct state for a callback to be run, avoiding the possibility of running callbacks on a referenced iclog that hasn't yet been submitted for IO. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
8bb92005 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: rework xlog_state_do_callback() Clean it up a bit by factoring and rearranging some of the code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b36d4651 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: make forced shutdown processing atomic The running of a forced shutdown is a bit of a mess. It does racy checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then does more racy checks in xfs_log_force_unmount() before finally setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the log->icloglock. Move the checking and setting of XFS_MOUNT_SHUTDOWN into xfs_do_force_shutdown() so we only process a shutdown once and once only. Serialise this with the mp->m_sb_lock spinlock so that the state change is atomic and won't race. Move all the mount specific shutdown state changes from xfs_log_force_unmount() to xfs_do_force_shutdown() so they are done atomically with setting XFS_MOUNT_SHUTDOWN. Then get rid of the racy xlog_is_shutdown() check from xlog_force_shutdown(), and gate the log shutdown on the test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This means that the log is shutdown once and once only, and code that needs to prevent races with shutdown can do so by holding the icloglock and checking the return value of xlog_is_shutdown(). This results in a predictable shutdown execution process - we set the shutdown flags once and process the shutdown once rather than the current "as many concurrent shutdowns as can race to the flag setting" situation we have now. Also, now that shutdown is atomic, alway emit a stack trace when the error level for the filesystem is high enough. This means that we always get a stack trace when trying to diagnose the cause of shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE cases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
e1d06e5f |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert log flags to an operational state field log->l_flags doesn't actually contain "flags" as such, it contains operational state information that can change at runtime. For the shutdown state, this at least should be an atomic bit because it is read without holding locks in many places and so using atomic bitops for the state field modifications makes sense. This allows us to use things like test_and_set_bit() on state changes (e.g. setting XLOG_TAIL_WARN) to avoid races in setting the state when we aren't holding locks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
fd67d8a0 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: move recovery needed state updates to xfs_log_mount_finish xfs_log_mount_finish() needs to know if recovery is needed or not to make decisions on whether to flush the log and AIL. Move the handling of the NEED_RECOVERY state out to this function rather than needing a temporary variable to store this state over the call to xlog_recover_finish(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
5112e206 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: XLOG_STATE_IOERROR must die We don't need an iclog state field to tell us the log has been shut down. We can just check the xlog_is_shutdown() instead. The avoids the need to have shutdown overwrite the current iclog state while being active used by the log code and so having to ensure that every iclog state check handles XLOG_STATE_IOERROR appropriately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2039a272 |
|
10-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert XLOG_FORCED_SHUTDOWN() to xlog_is_shutdown() Make it less shouty and a static inline before adding more calls through the log code. Also convert internal log code that uses XFS_FORCED_SHUTDOWN(mount) to use xlog_is_shutdown(log) as well. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2b73a2c8 |
|
08-Aug-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: clear log incompat feature bits when the log is idle When there are no ongoing transactions and the log contents have been checkpointed back into the filesystem, the log performs 'covering', which is to say that it log a dummy transaction to record the fact that the tail has caught up with the head. This is a good time to clear log incompat feature flags, because they are flags that are temporarily set to limit the range of kernels that can replay a dirty log. Since it's possible that some other higher level thread is about to start logging items protected by a log incompat flag, we create a rwsem so that upper level threads can coordinate this with the log. It would probably be more performant to use a percpu rwsem, but the ability to /try/ taking the write lock during covering is critical, and percpu rwsems do not provide that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
|
#
908ce71e |
|
08-Aug-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: allow setting and clearing of log incompat feature flags Log incompat feature flags in the superblock exist for one purpose: to protect the contents of a dirty log from replay on a kernel that isn't prepared to handle those dirty contents. This means that they can be cleared if (a) we know the log is clean and (b) we know that there aren't any other threads in the system that might be setting or relying upon a log incompat flag. Therefore, clear the log incompat flags when we've finished recovering the log, when we're unmounting cleanly, remounting read-only, or freezing; and provide a function so that subsequent patches can start using this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
|
#
d634525d |
|
09-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: replace kmem_alloc_large() with kvmalloc() There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
98fe2c3c |
|
09-Aug-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove kmem_alloc_io() Since commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), the core slab code now guarantees slab alignment in all situations sufficient for IO purposes (i.e. minimum of 512 byte alignment of >= 512 byte sized heap allocations) we no longer need the workaround in the XFS code to provide this guarantee. Replace the use of kmem_alloc_io() with kmem_alloc() or kmem_alloc_large() appropriately, and remove the kmem_alloc_io() interface altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
9d110014 |
|
28-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: limit iclog tail updates From the department of "generic/482 keeps on giving", we bring you another tail update race condition: iclog: S1 C1 +-----------------------+-----------------------+ S2 EOIC Two checkpoints in a single iclog. One is complete, the other just contains the start record and overruns into a new iclog. Timeline: Before S1: Cache flush, log tail = X At S1: Metadata stable, write start record and checkpoint At C1: Write commit record, set NEED_FUA Single iclog checkpoint, so no need for NEED_FLUSH Log tail still = X, so no need for NEED_FLUSH After C1, Before S2: Cache flush, log tail = X At S2: Metadata stable, write start record and checkpoint After S2: Log tail moves to X+1 At EOIC: End of iclog, more journal data to write Releases iclog Not a commit iclog, so no need for NEED_FLUSH Writes log tail X+1 into iclog. At this point, the iclog has tail X+1 and NEED_FUA set. There has been no cache flush for the metadata between X and X+1, and the iclog writes the new tail permanently to the log. THis is sufficient to violate on disk metadata/journal ordering. We have two options here. The first is to detect this case in some manner and ensure that the partial checkpoint write sets NEED_FLUSH when the iclog is already marked NEED_FUA and the log tail changes. This seems somewhat fragile and quite complex to get right, and it doesn't actually make it obvious what underlying problem it is actually addressing from reading the code. The second option seems much cleaner to me, because it is derived directly from the requirements of the C1 commit record in the iclog. That is, when we write this commit record to the iclog, we've guaranteed that the metadata/data ordering is correct for tail update purposes. Hence if we only write the log tail into the iclog for the *first* commit record rather than the log tail at the last release, we guarantee that the log tail does not move past where the the first commit record in the log expects it to be. IOWs, taking the first option means that replay of C1 becomes dependent on future operations doing the right thing, not just the C1 checkpoint itself doing the right thing. This makes log recovery almost impossible to reason about because now we have to take into account what might or might not have happened in the future when looking at checkpoints in the log rather than just having to reconstruct the past... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
8191d822 |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: avoid unnecessary waits in xfs_log_force_lsn() Before waiting on a iclog in xfs_log_force_lsn(), we don't check to see if the iclog has already been completed and the contents on stable storage. We check for completed iclogs in xfs_log_force(), so we should do the same thing for xfs_log_force_lsn(). This fixed some random up-to-30s pauses seen in unmounting filesystems in some tests. A log force ends up waiting on completed iclog, and that doesn't then get flushed (and hence the log force get completed) until the background log worker issues a log force that flushes the iclog in question. Then the unmount unblocks and continues. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
2bf1ec0f |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: log forces imply data device cache flushes After fixing the tail_lsn vs cache flush race, generic/482 continued to fail in a similar way where cache flushes were missing before iclog FUA writes. Tracing of iclog state changes during the fsstress workload portion of the test (via xlog_iclog* events) indicated that iclog writes were coming from two sources - CIL pushes and log forces (due to fsync/O_SYNC operations). All of the cases where a recovery problem was triggered indicated that the log force was the source of the iclog write that was not preceeded by a cache flush. This was an oversight in the modifications made in commit eef983ffeae7 ("xfs: journal IO cache flush reductions"). Log forces for fsync imply a data device cache flush has been issued if an iclog was flushed to disk and is indicated to the caller via the log_flushed parameter so they can elide the device cache flush if the journal issued one. The change in eef983ffeae7 results in iclogs only issuing a cache flush if XLOG_ICL_NEED_FLUSH is set on the iclog, but this was not added to the iclogs that the log force code flushes to disk. Hence log forces are no longer guaranteeing that a cache flush is issued, hence opening up a potential on-disk ordering failure. Log forces should also set XLOG_ICL_NEED_FUA as well to ensure that the actual iclogs it forces to the journal are also on stable storage before it returns to the caller. This patch introduces the xlog_force_iclog() helper function to encapsulate the process of taking a reference to an iclog, switching its state if WANT_SYNC and flushing it to stable storage correctly. Both xfs_log_force() and xfs_log_force_lsn() are converted to use it, as is xlog_unmount_write() which has an elaborate method of doing exactly the same "write this iclog to stable storage" operation. Further, if the log force code needs to wait on a iclog in the WANT_SYNC state, it needs to ensure that iclog also results in a cache flush being issued. This covers the case where the iclog contains the commit record of the CIL flush that the log force triggered, but it hasn't been written yet because there is still an active reference to the iclog. Note: this whole cache flush whack-a-mole patch is a result of log forces still being iclog state centric rather than being CIL sequence centric. Most of this nasty code will go away in future when log forces are converted to wait on CIL sequence push completion rather than iclog completion. With the CIL push algorithm guaranteeing that the CIL checkpoint is fully on stable storage when it completes, we no longer need to iterate iclogs and push them to ensure a CIL sequence push has completed and so all this nasty iclog iteration and flushing code will go away. Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
45eddb41 |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor out forced iclog flushes We force iclogs in several places - we need them all to have the same cache flush semantics, so start by factoring out the iclog force into a common helper. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
0dc8f7f1 |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix ordering violation between cache flushes and tail updates There is a race between the new CIL async data device metadata IO completion cache flush and the log tail in the iclog the flush covers being updated. This can be seen by repeating generic/482 in a loop and eventually log recovery fails with a failures such as this: XFS (dm-3): Starting recovery (logdev: internal) XFS (dm-3): bad inode magic/vsn daddr 228352 #0 (magic=0) XFS (dm-3): Metadata corruption detected at xfs_inode_buf_verify+0x180/0x190, xfs_inode block 0x37c00 xfs_inode_buf_verify XFS (dm-3): Unmount and run xfs_repair XFS (dm-3): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (dm-3): metadata I/O error in "xlog_recover_items_pass2+0x55/0xc0" at daddr 0x37c00 len 32 error 117 Analysis of the logwrite replay shows that there were no writes to the data device between the FUA @ write 124 and the FUA at write @ 125, but log recovery @ 125 failed. The difference was the one log write @ 125 moved the tail of the log forwards from (1,8) to (1,32) and so the inode create intent in (1,8) was not replayed and so the inode cluster was zero on disk when replay of the first inode item in (1,32) was attempted. What this meant was that the journal write that occurred at @ 125 did not ensure that metadata completed before the iclog was written was correctly on stable storage. The tail of the log moved forward, so IO must have been completed between the two iclog writes. This means that there is a race condition between the unconditional async cache flush in the CIL push work and the tail LSN that is written to the iclog. This happens like so: CIL push work AIL push work ------------- ------------- Add to committing list start async data dev cache flush ..... <flush completes> <all writes to old tail lsn are stable> xlog_write .... push inode create buffer <start IO> ..... xlog_write(commit record) .... <IO completes> log tail moves xlog_assign_tail_lsn() start_lsn == commit_lsn <no iclog preflush!> xlog_state_release_iclog __xlog_state_release_iclog() <writes *new* tail_lsn into iclog> xlog_sync() .... submit_bio() <tail in log moves forward without flushing written metadata> Essentially, this can only occur if the commit iclog is issued without a cache flush. If the iclog bio is submitted with REQ_PREFLUSH, then it will guarantee that all the completed IO is one stable storage before the iclog bio with the new tail LSN in it is written to the log. IOWs, the tail lsn that is written to the iclog needs to be sampled *before* we issue the cache flush that guarantees all IO up to that LSN has been completed. To fix this without giving up the performance advantage of the flush/FUA optimisations (e.g. g/482 runtime halves with 5.14-rc1 compared to 5.13), we need to ensure that we always issue a cache flush if the tail LSN changes between the initial async flush and the commit record being written. THis requires sampling the tail_lsn before we start the flush, and then passing the sampled tail LSN to xlog_state_release_iclog() so it can determine if the the tail LSN has changed while writing the checkpoint. If the tail LSN has changed, then it needs to set the NEED_FLUSH flag on the iclog and we'll issue another cache flush before writing the iclog. Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
9d392064 |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog Fold __xlog_state_release_iclog into its only caller to prepare make an upcoming fix easier. Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b5d721ea |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: external logs need to flush data device The recent journal flush/FUA changes replaced the flushing of the data device on every iclog write with an up-front async data device cache flush. Unfortunately, the assumption of which this was based on has been proven incorrect by the flush vs log tail update ordering issue. As the fix for that issue uses the XLOG_ICL_NEED_FLUSH flag to indicate that data device needs a cache flush, we now need to (once again) ensure that an iclog write to external logs that need a cache flush to be issued actually issue a cache flush to the data device as well as the log device. Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b1e27239 |
|
27-Jul-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: flush data dev on external log write We incorrectly flush the log device instead of the data device when trying to ensure metadata is correctly on disk before writing the unmount record. Fixes: eef983ffeae7 ("xfs: journal IO cache flush reductions") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
a1bb8505 |
|
25-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: Fix a CIL UAF by getting get rid of the iclog callback lock The iclog callback chain has it's own lock. That was added way back in 2008 by myself to alleviate severe lock contention on the icloglock in commit 114d23aae512 ("[XFS] Per iclog callback chain lock"). This was long before delayed logging took the icloglock out of the hot transaction commit path and removed all contention on it. Hence the separate ic_callback_lock doesn't serve any scalability purpose anymore, and hasn't for close on a decade. Further, we only attach callbacks to iclogs in one place where we are already taking the icloglock soon after attaching the callbacks. We also have to drop the icloglock to run callbacks and grab it immediately afterwards again. So given that the icloglock is no longer hot, making it cover callbacks again doesn't really change the locking patterns very much at all. We also need to extend the icloglock to cover callback addition to fix a zero-day UAF in the CIL push code. This occurs when shutdown races with xlog_cil_push_work() and the shutdown runs the callbacks before the push releases the iclog. This results in the CIL context structure attached to the iclog being freed by the callback before the CIL push has finished referencing it, leading to UAF bugs. Hence, to avoid this UAF, we need the callback attachment to be atomic with post processing of the commit iclog and references to the structures being attached to the iclog. This requires holding the icloglock as that's the only way to serialise iclog state against a shutdown in progress. The result is we need to be using the icloglock to protect the callback list addition and removal and serialise them with shutdown. That makes the ic_callback_lock redundant and so it can be removed. Fixes: 71e330b59390 ("xfs: Introduce delayed logging core code") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b6903358 |
|
25-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove callback dequeue loop from xlog_state_do_iclog_callbacks If we are processing callbacks on an iclog, nothing can be concurrently adding callbacks to the loop. We only add callbacks to the iclog when they are in ACTIVE or WANT_SYNC state, and we explicitly do not add callbacks if the iclog is already in IOERROR state. The only way to have a dequeue racing with an enqueue is to be processing a shutdown without a direct reference to an iclog in ACTIVE or WANT_SYNC state. As the enqueue avoids this race condition, we only ever need a single dequeue operation in xlog_state_do_iclog_callbacks(). Hence we can remove the loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
6be00102 |
|
25-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: don't nest icloglock inside ic_callback_lock It's completely unnecessary because callbacks are added to iclogs without holding the icloglock, hence no amount of ordering between the icloglock and ic_callback_lock will order the removal of callbacks from the iclog. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
4e6b8270 |
|
18-Jun-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: force the log offline when log intent item recovery fails If any part of log intent item recovery fails, we should shut down the log immediately to stop the log from writing a clean unmount record to disk, because the metadata is not consistent. The inability to cancel a dirty transaction catches most of these cases, but there are a few things that have slipped through the cracks, such as ENOSPC from a transaction allocation, or runtime errors that result in cancellation of a non-dirty transaction. This solves some weird behaviors reported by customers where a system goes down, the first mount fails, the second succeeds, but then the fs goes down later because of inconsistent metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
956f6daa |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: add iclog state trace events For the DEBUGS! Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
5f9b4b0d |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: xfs_log_force_lsn isn't passed a LSN In doing an investigation into AIL push stalls, I was looking at the log force code to see if an async CIL push could be done instead. This lead me to xfs_log_force_lsn() and looking at how it works. xfs_log_force_lsn() is only called from inode synchronisation contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn value as the LSN to sync the log to. This gets passed to xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the journal, and then used by xfs_log_force_lsn() to flush the iclogs to the journal. The problem is that ip->i_itemp->ili_last_lsn does not store a log sequence number. What it stores is passed to it from the ->iop_committing method, which is called by xfs_log_commit_cil(). The value this passes to the iop_committing method is the CIL context sequence number that the item was committed to. As it turns out, xlog_cil_force_lsn() converts the sequence to an actual commit LSN for the related context and returns that to xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn" variable that contained a sequence with an actual LSN and then uses that to sync the iclogs. This caused me some confusion for a while, even though I originally wrote all this code a decade ago. ->iop_committing is only used by a couple of log item types, and only inode items use the sequence number it is passed. Let's clean up the API, CIL structures and inode log item to call it a sequence number, and make it clear that the high level code is using CIL sequence numbers and not on-disk LSNs for integrity synchronisation purposes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
eef983ff |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: journal IO cache flush reductions Currently every journal IO is issued as REQ_PREFLUSH | REQ_FUA to guarantee the ordering requirements the journal has w.r.t. metadata writeback. THe two ordering constraints are: 1. we cannot overwrite metadata in the journal until we guarantee that the dirty metadata has been written back in place and is stable. 2. we cannot write back dirty metadata until it has been written to the journal and guaranteed to be stable (and hence recoverable) in the journal. The ordering guarantees of #1 are provided by REQ_PREFLUSH. This causes the journal IO to issue a cache flush and wait for it to complete before issuing the write IO to the journal. Hence all completed metadata IO is guaranteed to be stable before the journal overwrites the old metadata. The ordering guarantees of #2 are provided by the REQ_FUA, which ensures the journal writes do not complete until they are on stable storage. Hence by the time the last journal IO in a checkpoint completes, we know that the entire checkpoint is on stable storage and we can unpin the dirty metadata and allow it to be written back. This is the mechanism by which ordering was first implemented in XFS way back in 2002 by commit 95d97c36e5155075ba2eb22b17562cfcc53fcf96 ("Add support for drive write cache flushing") in the xfs-archive tree. A lot has changed since then, most notably we now use delayed logging to checkpoint the filesystem to the journal rather than write each individual transaction to the journal. Cache flushes on journal IO are necessary when individual transactions are wholly contained within a single iclog. However, CIL checkpoints are single transactions that typically span hundreds to thousands of individual journal writes, and so the requirements for device cache flushing have changed. That is, the ordering rules I state above apply to ordering of atomic transactions recorded in the journal, not to the journal IO itself. Hence we need to ensure metadata is stable before we start writing a new transaction to the journal (guarantee #1), and we need to ensure the entire transaction is stable in the journal before we start metadata writeback (guarantee #2). Hence we only need a REQ_PREFLUSH on the journal IO that starts a new journal transaction to provide #1, and it is not on any other journal IO done within the context of that journal transaction. The CIL checkpoint already issues a cache flush before it starts writing to the log, so we no longer need the iclog IO to issue a REQ_REFLUSH for us. Hence if XLOG_START_TRANS is passed to xlog_write(), we no longer need to mark the first iclog in the log write with REQ_PREFLUSH for this case. As an added bonus, this ordering mechanism works for both internal and external logs, meaning we can remove the explicit data device cache flushes from the iclog write code when using external logs. Given the new ordering semantics of commit records for the CIL, we need iclogs containing commit records to issue a REQ_PREFLUSH. We also require unmount records to do this. Hence for both XLOG_COMMIT_TRANS and XLOG_UNMOUNT_TRANS xlog_write() calls we need to mark the first iclog being written with REQ_PREFLUSH. For both commit records and unmount records, we also want them immediately on stable storage, so we want to also mark the iclogs that contain these records to be marked REQ_FUA. That means if a record is split across multiple iclogs, they are all marked REQ_FUA and not just the last one so that when the transaction is completed all the parts of the record are on stable storage. And for external logs, unmount records need a pre-write data device cache flush similar to the CIL checkpoint cache pre-flush as the internal iclog write code does not do this implicitly anymore. As an optimisation, when the commit record lands in the same iclog as the journal transaction starts, we don't need to wait for anything and can simply use REQ_FUA to provide guarantee #2. This means that for fsync() heavy workloads, the cache flush behaviour is completely unchanged and there is no degradation in performance as a result of optimise the multi-IO transaction case. The most notable sign that there is less IO latency on my test machine (nvme SSDs) is that the "noiclogs" rate has dropped substantially. This metric indicates that the CIL push is blocking in xlog_get_iclog_space() waiting for iclog IO completion to occur. With 8 iclogs of 256kB, the rate is appoximately 1 noiclog event to every 4 iclog writes. IOWs, every 4th call to xlog_get_iclog_space() is blocking waiting for log IO. With the changes in this patch, this drops to 1 noiclog event for every 100 iclog writes. Hence it is clear that log IO is completing much faster than it was previously, but it is also clear that for large iclog sizes, this isn't the performance limiting factor on this hardware. With smaller iclogs (32kB), however, there is a substantial difference. With the cache flush modifications, the journal is now running at over 4000 write IOPS, and the journal throughput is largely identical to the 256kB iclogs and the noiclog event rate stays low at about 1:50 iclog writes. The existing code tops out at about 2500 IOPS as the number of cache flushes dominate performance and latency. The noiclog event rate is about 1:4, and the performance variance is quite large as the journal throughput can fall to less than half the peak sustained rate when the cache flush rate prevents metadata writeback from keeping up and the log runs out of space and throttles reservations. As a result: logbsize fsmark create rate rm -rf before 32kb 152851+/-5.3e+04 5m28s patched 32kb 221533+/-1.1e+04 5m24s before 256kb 220239+/-6.2e+03 4m58s patched 256kb 228286+/-9.2e+03 5m06s The rm -rf times are included because I ran them, but the differences are largely noise. This workload is largely metadata read IO latency bound and the changes to the journal cache flushing doesn't really make any noticable difference to behaviour apart from a reduction in noiclog events from background CIL pushing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
3468bb1c |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove need_start_rec parameter from xlog_write() The CIL push is the only call to xlog_write that sets this variable to true. The other callers don't need a start rec, and they tell xlog_write what to do by passing the type of ophdr they need written in the flags field. The need_start_rec parameter essentially tells xlog_write to to write an extra ophdr with a XLOG_START_TRANS type, so get rid of the variable to do this and pass XLOG_START_TRANS as the flag value into xlog_write() from the CIL push. $ size fs/xfs/xfs_log.o* text data bss dec hex filename 27595 560 8 28163 6e03 fs/xfs/xfs_log.o.orig 27454 560 8 28022 6d76 fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
b5071ada |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove xfs_blkdev_issue_flush It's a one line wrapper around blkdev_issue_flush(). Just replace it with direct calls to blkdev_issue_flush(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
a79b28c2 |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: separate CIL commit record IO To allow for iclog IO device cache flush behaviour to be optimised, we first need to separate out the commit record iclog IO from the rest of the checkpoint so we can wait for the checkpoint IO to complete before we issue the commit record. This separation is only necessary if the commit record is being written into a different iclog to the start of the checkpoint as the upcoming cache flushing changes requires completion ordering against the other iclogs submitted by the checkpoint. If the entire checkpoint and commit is in the one iclog, then they are both covered by the one set of cache flush primitives on the iclog and hence there is no need to separate them for ordering. Otherwise, we need to wait for all the previous iclogs to complete so they are ordered correctly and made stable by the REQ_PREFLUSH that the commit record iclog IO issues. This guarantees that if a reader sees the commit record in the journal, they will also see the entire checkpoint that commit record closes off. This also provides the guarantee that when the commit record IO completes, we can safely unpin all the log items in the checkpoint so they can be written back because the entire checkpoint is stable in the journal. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
18842e0a |
|
18-Jun-2021 |
Geert Uytterhoeven <geert@linux-m68k.org> |
xfs: Fix 64-bit division on 32-bit in xlog_state_switch_iclogs() On 32-bit (e.g. m68k): ERROR: modpost: "__udivdi3" [fs/xfs/xfs.ko] undefined! Fix this by using a uint32_t intermediate, like before. Reported-by: noreply@ellerman.id.au Fixes: 7660a5b48fbef958 ("xfs: log stripe roundoff is a property of the log") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
a6a65fef |
|
18-Jun-2021 |
Dave Chinner <dchinner@redhat.com> |
xfs: log stripe roundoff is a property of the log We don't need to look at the xfs_mount and superblock every time we need to do an iclog roundoff calculation. The property is fixed for the life of the log, so store the roundoff in the log at mount time and use that everywhere. On a debug build: $ size fs/xfs/xfs_log.o.* text data bss dec hex filename 27360 560 8 27928 6d18 fs/xfs/xfs_log.o.orig 27219 560 8 27787 6c8b fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
53004ee7 |
|
20-Apr-2021 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
xfs: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix the following warnings by replacing /* fall through */ comments, and its variants, with the new pseudo-keyword macro fallthrough: fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Notice that Clang doesn't recognize /* fall through */ comments as implicit fall-through markings, so in order to globally enable -Wimplicit-fallthrough for Clang, these comments need to be replaced with fallthrough; in the whole codebase. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
|
#
8e9800f9 |
|
29-Apr-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: don't allow log writes if the data device is readonly While running generic/050 with an external log, I observed this warning in dmesg: Trying to write to read-only block-device sda4 (partno 4) WARNING: CPU: 2 PID: 215677 at block/blk-core.c:704 submit_bio_checks+0x256/0x510 Call Trace: submit_bio_noacct+0x2c/0x430 _xfs_buf_ioapply+0x283/0x3c0 [xfs] __xfs_buf_submit+0x6a/0x210 [xfs] xfs_buf_delwri_submit_buffers+0xf8/0x270 [xfs] xfsaild+0x2db/0xc50 [xfs] kthread+0x14b/0x170 I think this happened because we tried to cover the log after a readonly mount, and the AIL tried to write the primary superblock to the data device. The test marks the data device readonly, but it doesn't do the same to the external log device. Therefore, XFS thinks that the log is writable, even though AIL writes whine to dmesg because the data device is read only. Fix this by amending xfs_log_writable to prevent writes when the AIL can't possible write anything into the filesystem. Note: As for the external log or the rt devices being readonly-- xfs_blkdev_get will complain about that if we aren't doing a norecovery mount. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
8646b982 |
|
10-Feb-2021 |
kernel test robot <lkp@intel.com> |
xfs: fix boolreturn.cocci warnings fs/xfs/xfs_log.c:1062:9-10: WARNING: return of 0/1 in function 'xfs_log_need_covered' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: scripts/coccinelle/misc/boolreturn.cocci Fixes: 37444fc4cc39 ("xfs: lift writable fs check up into log worker task") CC: Brian Foster <bfoster@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: kernel test robot <lkp@intel.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
05a302a1 |
|
22-Jan-2021 |
Darrick J. Wong <djwong@kernel.org> |
xfs: set WQ_SYSFS on all workqueues in debug mode When CONFIG_XFS_DEBUG=y, set WQ_SYSFS on all workqueues that we create so that we (developers) have a means to monitor cpu affinity and whatnot for background workers. In the next patchset we'll expose knobs for more of the workqueues publicly and document it, but not now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
4533fc63 |
|
26-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: fix unused log variable in xfs_log_cover() The log variable is only used in kernels with asserts enabled. Remove it and open code the dereference to avoid unused variable warnings. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
f46e5a17 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: fold sbcount quiesce logging into log covering xfs_log_sbcount() calls xfs_sync_sb() to sync superblock counters to disk when lazy superblock accounting is enabled. This occurs on unmount, freeze, and read-only (re)mount and ensures the final values are calculated and persisted to disk before each form of quiesce completes. Now that log covering occurs in all of these contexts and uses the same xfs_sync_sb() mechanism to update log state, there is no need to log the superblock separately for any reason. Update the log quiesce path to sync the superblock at least once for any mount where lazy superblock accounting is enabled. If the log is already covered, it will remain in the covered state. Otherwise, the next sync as part of the normal covering sequence will carry the associated superblock update with it. Remove xfs_log_sbcount() now that it is no longer needed. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
b0eb9e11 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: don't reset log idle state on covering checkpoints Now that log covering occurs on quiesce, we'd like to reuse the underlying superblock sync for final superblock updates. This includes things like lazy superblock counter updates, log feature incompat bits in the future, etc. One quirk to this approach is that once the log is in the IDLE (i.e. already covered) state, any subsequent log write resets the state back to NEED. This means that a final superblock sync to an already covered log requires two more sb syncs to return the log back to IDLE again. For example, if a lazy superblock enabled filesystem is mount cycled without any modifications, the unmount path syncs the superblock once and writes an unmount record. With the desired log quiesce covering behavior, we sync the superblock three times at unmount time: once for the lazy superblock counter update and twice more to cover the log. By contrast, if the log is active or only partially covered at unmount time, a final superblock sync would doubly serve as the one or two remaining syncs required to cover the log. This duplicate covering sequence is unnecessary because the filesystem remains consistent if a crash occurs at any point. The superblock will either be recovered in the event of a crash or written back before the log is quiesced and potentially cleaned with an unmount record. Update the log covering state machine to remain in the IDLE state if additional covering checkpoints pass through the log. This facilitates final superblock updates (such as lazy superblock counters) via a single sb sync without losing covered status. This provides some consistency with the active and partially covered cases and also avoids harmless, but spurious checkpoints when quiescing the log. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
303591a0 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: cover the log during log quiesce The log quiesce mechanism historically terminates by marking the log clean with an unmount record. The primary objective is to indicate that log recovery is no longer required after the quiesce has flushed all in-core changes and written back filesystem metadata. While this is perfectly fine, it is somewhat hacky as currently used in certain contexts. For example, filesystem freeze quiesces (i.e. cleans) the log and immediately redirties it with a dummy superblock transaction to ensure that log recovery runs in the event of a crash. While this functions correctly, cleaning the log from freeze context is clearly superfluous given the current redirtying behavior. Instead, the desired behavior can be achieved by simply covering the log. This effectively retires all on-disk log items from the active range of the log by issuing two synchronous and sequential dummy superblock update transactions that serve to update the on-disk log head and tail. The subtle difference is that the log technically remains dirty due to the lack of an unmount record, though recovery is effectively a no-op due to the content of the checkpoints being clean (i.e. the unmodified on-disk superblock). Log covering currently runs in the background and only triggers once the filesystem and log has idled. The purpose of the background mechanism is to prevent log recovery from replaying the most recently logged items long after those items may have been written back. In the quiesce path, the log has been deliberately idled by forcing the log and pushing the AIL until empty in a context where no further mutable filesystem operations are allowed. Therefore, we can cover the log as the final step in the log quiesce codepath to reflect that all previously active items have been successfully written back. This facilitates selective log covering from certain contexts (i.e. freeze) that only seek to quiesce, but not necessarily clean the log. Note that as a side effect of this change, log covering now occurs when cleaning the log as well. This is harmless, facilitates subsequent cleanups, and is mostly temporary as various operations switch to use explicit log covering. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
|
#
9e54ee0f |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: separate log cleaning from log quiesce Log quiesce is currently associated with cleaning the log, which is accomplished by writing an unmount record as the last step of the quiesce sequence. The quiesce codepath is a bit convoluted in this regard due to how it is reused from various contexts. In preparation to create separate log cleaning and log covering interfaces, lift the write of the unmount record into a new cleaning helper and call that wherever xfs_log_quiesce() is currently invoked. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
37444fc4 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: lift writable fs check up into log worker task The log covering helper checks whether the filesystem is writable to determine whether to cover the log. The helper is currently only called from the background log worker. In preparation to reuse the helper from freezing contexts, lift the check into xfs_log_worker(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
50d25484 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: sync lazy sb accounting on quiesce of read-only mounts xfs_log_sbcount() syncs the superblock specifically to accumulate the in-core percpu superblock counters and commit them to disk. This is required to maintain filesystem consistency across quiesce (freeze, read-only mount/remount) or unmount when lazy superblock accounting is enabled because individual transactions do not update the superblock directly. This mechanism works as expected for writable mounts, but xfs_log_sbcount() skips the update for read-only mounts. Read-only mounts otherwise still allow log recovery and write out an unmount record during log quiesce. If a read-only mount performs log recovery, it can modify the in-core superblock counters and write an unmount record when the filesystem unmounts without ever syncing the in-core counters. This leaves the filesystem with a clean log but in an inconsistent state with regard to lazy sb counters. Update xfs_log_sbcount() to use the same logic xfs_log_unmount_write() uses to determine when to write an unmount record. This ensures that lazy accounting is always synced before the log is cleaned. Refactor this logic into a new helper to distinguish between a writable filesystem and a writable log. Specifically, the log is writable unless the filesystem is mounted with the norecovery mount option, the underlying log device is read-only, or the filesystem is shutdown. Drop the freeze state check because the update is already allowed during the freezing process and no context calls this function on an already frozen fs. Also, retain the shutdown check in xfs_log_unmount_write() to catch the case where the preceding log force might have triggered a shutdown. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
8321ddb2 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: don't drain buffer lru on freeze and read-only remount xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the buffer cache is reclaimed during unmount. xfs_log_quiesce() is also called from xfs_quiesce_attr(), however, which means that cache state is completely drained for filesystem freeze and read-only remount. While technically harmless, this is unnecessarily heavyweight. Both freeze and read-only mounts allow reads and thus allow population of the buffer cache. Therefore, the transitional sequence in either case really only needs to quiesce outstanding writes to return the filesystem in a generally read-only state. Additionally, some users have reported that attempts to freeze a filesystem concurrent with a read-heavy workload causes the freeze process to stall for a significant amount of time. This occurs because, as mentioned above, the read workload repopulates the buffer LRU while the freeze task attempts to drain it. To improve this situation, replace the drain in xfs_log_quiesce() with a buffer I/O quiesce and lift the drain into the unmount path. This removes buffer LRU reclaim from freeze and read-only [re]mount, but ensures the LRU is still drained before the filesystem unmounts. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
10fb9ac1 |
|
22-Jan-2021 |
Brian Foster <bfoster@redhat.com> |
xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain() xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its primary purpose is to reclaim all buffers from the provided buffer target LRU. In preparation to refactor xfs_wait_buftarg() into serialization and LRU draining components, rename the function and associated helpers to something more descriptive. This patch has no functional changes with the minor exception of renaming a tracepoint. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
#
ed1575da |
|
25-Sep-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: expose the log push threshold Separate the computation of the log push threshold and the push logic in xlog_grant_push_ail. This enables higher level code to determine (for example) that it is holding on to a logged intent item and the log is so busy that it is more than 75% full. In that case, it would be desirable to move the log item towards the head to release the tail, which we will cover in the next patch. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
0c771b99 |
|
22-Sep-2020 |
Gao Xiang <hsiangkao@redhat.com> |
xfs: clean up calculation of LR header blocks Let's use DIV_ROUND_UP() to calculate log record header blocks as what did in xlog_get_iclog_buffer_size() and wrap up a common helper for log recovery. Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
ca4f2589 |
|
22-Jul-2020 |
Carlos Maiolino <cmaiolino@redhat.com> |
xfs: Modify xlog_ticket_alloc() to use kernel's MM API xlog_ticket_alloc() is always called under NOFS context, except from unmount path, which eitherway is holding many FS locks, so, there is no need for its callers to keep passing allocation flags into it. change xlog_ticket_alloc() to use default kmem_cache_zalloc(), remove its alloc_flags argument, and always use GFP_NOFS | __GFP_NOFAIL flags. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
5cc3c006 |
|
26-Mar-2020 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: don't write a corrupt unmount record to force summary counter recalc In commit f467cad95f5e3, I added the ability to force a recalculation of the filesystem summary counters if they seemed incorrect. This was done (not entirely correctly) by tweaking the log code to write an unmount record without the UMOUNT_TRANS flag set. At next mount, the log recovery code will fail to find the unmount record and go into recovery, which triggers the recalculation. What actually gets written to the log is what ought to be an unmount record, but without any flags set to indicate what kind of record it actually is. This worked to trigger the recalculation, but we shouldn't write bogus log records when we could simply write nothing. Fixes: f467cad95f5e3 ("xfs: force summary counter recalc at next mount") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
2def2845 |
|
24-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: don't allow log IO to be throttled Running metadata intensive workloads, I've been seeing the AIL pushing getting stuck on pinned buffers and triggering log forces. The log force is taking a long time to run because the log IO is getting throttled by wbt_wait() - the block layer writeback throttle. It's being throttled because there is a huge amount of metadata writeback going on which is filling the request queue. IOWs, we have a priority inversion problem here. Mark the log IO bios with REQ_IDLE so they don't get throttled by the block layer writeback throttle. When we are forcing the CIL, we are likely to need to to tens of log IOs, and they are issued as fast as they can be build and IO completed. Hence REQ_IDLE is appropriate - it's an indication that more IO will follow shortly. And because we also set REQ_SYNC, the writeback throttle will now treat log IO the same way it treats direct IO writes - it will not throttle them at all. Hence we solve the priority inversion problem caused by the writeback throttle being unable to distinguish between high priority log IO and background metadata writeback. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
b843299b |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove some stale comments from the log code Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3c702f95 |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: refactor unmount record writing Separate out the unmount record writing from the rest of the ticket and log state futzing necessary to make it work. This is a no-op, just makes the code cleaner and places the unmount record formatting and writing alongside the commit record formatting and writing code. We can also get rid of the ticket flag clearing before the xlog_write() call because it no longer cares about the state of XLOG_TIC_INITED. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f10e925d |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: merge xlog_commit_record with xlog_write_done xlog_write_done() is just a thin wrapper around xlog_commit_record(), so they can be merged together easily. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8b41e3f9 |
|
25-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: split xlog_ticket_done Remove xlog_ticket_done and just call the renamed low-level helpers for ungranting or regranting log space directly. To make that a little the reference put on the ticket and all tracing is moved into the actual helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
70e42f2d |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: kill XLOG_TIC_INITED It is not longer used or checked by anything, so remove the last traces from the log ticket code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
dd401770 |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: refactor and split xfs_log_done() xfs_log_done() does two separate things. Firstly, it triggers commit records to be written for permanent transactions, and secondly it releases or regrants transaction reservation space. Since delayed logging was introduced, transactions no longer write directly to the log, hence they never have the XLOG_TIC_INITED flag cleared on them. Hence transactions never write commit records to the log and only need to modify reservation space. Split up xfs_log_done into two parts, and only call the parts of the operation needed for the context xfs_log_done() is currently being called from. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9590e9c6 |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: re-order initial space accounting checks in xlog_write Commit and unmount records records do not need start records to be written, so rearrange the logic in xlog_write() to remove the need to check for XLOG_TIC_INITED to determine if we should account for the space used by a start record. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7ec94921 |
|
25-Mar-2020 |
Dave Chinner <dchinner@redhat.com> |
xfs: don't try to write a start record into every iclog The xlog_write() function iterates over iclogs until it completes writing all the log vectors passed in. The ticket tracks whether a start record has been written or not, so only the first iclog gets a start record. We only ever pass single use tickets to xlog_write() so we only ever need to write a start record once per xlog_write() call. Hence we don't need to store whether we should write a start record in the ticket as the callers provide all the information we need to determine if a start record should be written. For the moment, we have to ensure that we clear the XLOG_TIC_INITED appropriately so the code in xfs_log_done() still works correctly for committing transactions. (darrick: Note the slight behavior change that we always deduct the size of the op header from the ticket, even for unmount records) Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: pass an explicit need_start_rec argument] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
842a42d1 |
|
25-Mar-2020 |
Brian Foster <bfoster@redhat.com> |
xfs: shutdown on failure to add page to log bio If the bio_add_page() call fails, we proceed to write out a partially constructed log buffer. This corrupts the physical log such that log recovery is not possible. Worse, persistent occurrences of this error eventually lead to a BUG_ON() failure in bio_split() as iclogs wrap the end of the physical log, which triggers log recovery on subsequent mount. Rather than warn about writing out a corrupted log buffer, shutdown the fs as is done for any log I/O related error. This preserves the consistency of the physical log such that log recovery succeeds on a subsequent mount. Note that this was observed on a 64k page debug kernel without upstream commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), which demonstrated frequent iclog bio overflows due to unaligned (slab allocated) iclog data buffers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
69363999 |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xlog_state_want_sync Open code the xlog_state_want_sync logic in its two callers given that this function is a trivial wrapper around xlog_state_switch_iclogs. Move the lockdep assert into xlog_state_switch_iclogs to not lose this debugging aid, and improve the comment that documents xlog_state_switch_iclogs as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
5781464b |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: move the ioerror check out of xlog_state_clean_iclog Use the shutdown flag in the log to bypass xlog_state_clean_iclog entirely in case of a shut down log. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c814b4f2 |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xlog_state_clean_iclog Factor out a few self-contained helpers from xlog_state_clean_iclog, and update the documentation so it primarily documents why things happens instead of how. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
12e6a0f4 |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the aborted parameter to xlog_state_done_syncing We can just check for a shut down log all the way down in xlog_cil_committed instead of passing the parameter. This means a slight behavior change in that we now also abort log items if the shutdown came in halfway into the I/O completion processing, which actually is the right thing to do. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a582f32f |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: simplify log shutdown checking in xfs_log_release_iclog There is no need to check for the ioerror state before the lock, as the shutdown case is not a fast path. Also remove the call to force shutdown the file system, as it must have been shut down already for an iclog to be in the ioerror state. Also clean up the flow of the function a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f97a43e4 |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: simplify the xfs_log_release_iclog calling convention The only caller of xfs_log_release_iclog doesn't care about the return value, so remove it. Also don't bother passing the mount pointer, given that we can trivially derive it from the iclog. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
81e5b50a |
|
20-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: factor out a xlog_wait_on_iclog helper Factor out the shared code to wait for a log force into a new helper. This helper uses the XLOG_FORCED_SHUTDOWN check previous only used by the unmount code over the equivalent iclog ioerror state used by the other two functions. There is a slight behavior change in that the force of the unmount record is now accounted in the log force statistics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
13859c98 |
|
12-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup xfs_log_unmount_write Move the code for verifying the iclog state on a clean unmount into a helper, and instead of checking the iclog state just rely on the shutdown check as they are equivalent. Also remove the ifdef DEBUG as the compiler is smart enough to eliminate the dead code for non-debug builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6178d104 |
|
12-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove dead code from xfs_log_unmount_write When the log is shut down all iclogs are in the XLOG_STATE_IOERROR state, which means that xlog_state_want_sync and xlog_state_release_iclog are no-ops. Remove the whole section of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
550319e9 |
|
12-Mar-2020 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the unused return value from xfs_log_unmount_write Remove the ignored return value from xfs_log_unmount_write, and also remove a rather pointless assert on the return value from xfs_log_force. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6b789c33 |
|
21-Feb-2020 |
Brian Foster <bfoster@redhat.com> |
xfs: fix iclog release error check race with shutdown Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with l_icloglock held"), xlog_state_release_iclog() always performed a locked check of the iclog error state before proceeding into the sync state processing code. As of this commit, part of xlog_state_release_iclog() was open-coded into xfs_log_release_iclog() and as a result the locked error state check was lost. The lockless check still exists, but this doesn't account for the possibility of a race with a shutdown being performed by another task causing the iclog state to change while the original task waits on ->l_icloglock. This has reproduced very rarely via generic/475 and manifests as an assert failure in __xlog_state_release_iclog() due to an unexpected iclog state. Restore the locked error state check in xlog_state_release_iclog() to ensure that an iclog state update via shutdown doesn't race with the iclog release state processing code. Fixes: df732b29c807 ("xfs: call xlog_state_release_iclog with l_icloglock held") Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
798a9cad |
|
03-Dec-2019 |
Brian Foster <bfoster@redhat.com> |
xfs: fix mount failure crash on invalid iclog memory access syzbot (via KASAN) reports a use-after-free in the error path of xlog_alloc_log(). Specifically, the iclog freeing loop doesn't handle the case of a fully initialized ->l_iclog linked list. Instead, it assumes that the list is partially constructed and NULL terminated. This bug manifested because there was no possible error scenario after iclog list setup when the original code was added. Subsequent code and associated error conditions were added some time later, while the original error handling code was never updated. Fix up the error loop to terminate either on a NULL iclog or reaching the end of the list. Reported-by: syzbot+c732f8644185de340492@syzkaller.appspotmail.com Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
377bcd5f |
|
14-Nov-2019 |
Carlos Maiolino <cmaiolino@redhat.com> |
xfs: Remove kmem_zone_free() wrapper We can remove it now, without needing to rework the KM_ flags. Use kmem_cache_free() directly. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f7559793 |
|
06-Nov-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: annotate functions that trip static checker locking checks Add some lock annotations to helper functions that seem to have unbalanced locking that confuses the static analyzers. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e1d3d218 |
|
04-Nov-2019 |
Ian Kent <raven@themaw.net> |
xfs: use super s_id instead of struct xfs_mount m_fsname Eliminate struct xfs_mount field m_fsname by using the super block s_id field directly. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4b29ab04 |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the XLOG_STATE_DO_CALLBACK state XLOG_STATE_DO_CALLBACK is only entered through XLOG_STATE_DONE_SYNC and just used in a single debug check. Remove the flag and thus simplify the calling conventions for xlog_state_do_callback and xlog_state_iodone_process_iclog. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
1858bb0b |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: turn ic_state into an enum ic_state really is a set of different states, even if the values are encoded as non-conflicting bits and we sometimes use logical and operations to check for them. Switch all comparisms to check for exact values (and use switch statements in a few places to make it more clear) and turn the values into an implicitly enumerated enum type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
032cc34e |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove dead ifdef XFSERRORDEBUG code XFSERRORDEBUG is never set and the code isn't all that useful, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
df732b29 |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: call xlog_state_release_iclog with l_icloglock held All but one caller of xlog_state_release_iclog hold l_icloglock and need to drop and reacquire it to call xlog_state_release_iclog. Switch the xlog_state_release_iclog calling conventions to expect the lock to be held, and open code the logic (using a shared helper) in the only remaining caller that does not have the lock (and where not holding it is a nice performance optimization). Also move the refactored code to require the least amount of forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor whitespace cleanup] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
390aab0a |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: move the locking from xlog_state_finish_copy to the callers This will allow optimizing various locking cycles in the following patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
2c68a1df |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the unused ic_io_size field from xlog_in_core ic_io_size is only used inside xlog_write_iclog, where we can just use the count parameter intead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
cd95cb96 |
|
14-Oct-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: pass the correct flag to xlog_write_iclog xlog_write_iclog expects a bool for the second argument. While any non-0 value happens to work fine this makes all calls consistent. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
3219e8cf |
|
04-Oct-2019 |
Bill O'Donnell <billodo@redhat.com> |
xfs: assure zeroed memory buffers for certain kmem allocations Guarantee zeroed memory buffers for cases where potential memory leak to disk can occur. In these cases, kmem_alloc is used and doesn't zero the buffer, opening the possibility of information leakage to disk. Use existing infrastucture (xfs_buf_allocate_memory) to obtain the already zeroed buffer from kernel memory. This solution avoids the performance issue that would occur if a wholesale change to replace kmem_alloc with kmem_zalloc was done. Signed-off-by: Bill O'Donnell <billodo@redhat.com> [darrick: fix bitwise complaint about kmflag_mask] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
14e15f1b |
|
05-Sep-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: push the grant head when the log head moves forward When the log fills up, we can get into the state where the outstanding items in the CIL being committed and aggregated are larger than the range that the reservation grant head tail pushing will attempt to clean. This can result in the tail pushing range being trimmed back to the the log head (l_last_sync_lsn) and so may not actually move the push target at all. When the iclogs associated with the CIL commit finally land, the log head moves forward, and this removes the restriction on the AIL push target. However, if we already have transactions sleeping on the grant head, and there's nothing in the AIL still to flush from the current push target, then nothing will move the tail of the log and trigger a log reservation wakeup. Hence the there is nothing that will trigger xlog_grant_push_ail() to recalculate the AIL push target and start pushing on the AIL again to write back the metadata objects that pin the tail of the log and hence free up space and allow the transaction reservations to be woken and make progress. Hence we need to push on the grant head when we move the log head forward, as this may be the only trigger we have that can move the AIL push target forwards in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0383f543 |
|
05-Sep-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: push iclog state cleaning into xlog_state_clean_log xlog_state_clean_log() is only called from one place, and it occurs when an iclog is transitioning back to ACTIVE. Prior to calling xlog_state_clean_log, the iclog we are processing has a hard coded state check to DIRTY so that xlog_state_clean_log() processes it correctly. We also have a hard coded wakeup after xlog_state_clean_log() to enfore log force waiters on that iclog are woken correctly. Both of these things are operations required to finish processing an iclog and return it to the ACTIVE state again, so they make little sense to be separated from the rest of the clean state transition code. Hence push these things inside xlog_state_clean_log(), document the behaviour and rename it xlog_state_clean_iclog() to indicate that it's being driven by an iclog state change and does the iclog state change work itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
5e96fa8d |
|
05-Sep-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor iclog state processing out of xlog_state_do_callback() The iclog IO completion state processing is somewhat complex, and because it's inside two nested loops it is highly indented and very hard to read. Factor it out, flatten the logic flow and clean up the comments so that it much easier to see what the code is doing both in processing the individual iclogs and in the over xlog_state_do_callback() operation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6546818c |
|
05-Sep-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor callbacks out of xlog_state_do_callback() Simplify the code flow by lifting the iclog callback work out of the main iclog iteration loop. This isolates the log juggling and callbacks from the iclog state change logic in the loop. Note that the loopdidcallbacks variable is not actually tracking whether callbacks are actually run - it is tracking whether the icloglock was dropped during the loop and so determines if we completed the entire iclog scan loop atomically. Hence we know for certain there are either no more ordered completions to run or that the next completion will run the remaining ordered iclog completions. Hence rename that variable appropriately for it's function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6769aa2a |
|
05-Sep-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor debug code out of xlog_state_do_callback() Start making this function readable by lifting the debug code into a conditional function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
cdea5459 |
|
05-Sep-2019 |
Rik van Riel <riel@surriel.com> |
xfs: fix missed wakeup on l_flush_wait The code in xlog_wait uses the spinlock to make adding the task to the wait queue, and setting the task state to UNINTERRUPTIBLE atomic with respect to the waker. Doing the wakeup after releasing the spinlock opens up the following race condition: Task 1 task 2 add task to wait queue wake up task set task state to UNINTERRUPTIBLE This issue was found through code inspection as a result of kworkers being observed stuck in UNINTERRUPTIBLE state with an empty wait queue. It is rare and largely unreproducable. Simply moving the spin_unlock to after the wake_up_all results in the waker not being able to see a task on the waitqueue before it has set its state to UNINTERRUPTIBLE. This bug dates back to the conversion of this code to generic waitqueue infrastructure from a counting semaphore back in 2008 which didn't place the wakeups consistently w.r.t. to the relevant spin locks. [dchinner: Also fix a similar issue in the shutdown path on xc_commit_wait. Update commit log with more details of the issue.] Fixes: d748c62367eb ("[XFS] Convert l_flushsema to a sv_t") Reported-by: Chris Mason <clm@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7c107afb |
|
05-Sep-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: push the AIL in xlog_grant_head_wake In the situation where the log is full and the CIL has not recently flushed, the AIL push threshold is throttled back to the where the last write of the head of the log was completed. This is stored in log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space pinned by flushes and/or aggregation in progress, we can get the situation where the head of the log lags a long way behind the reservation grant head. When this happens, the AIL push target is trimmed back from where the reservation grant head wants to push the log tail to, back to where the head of the log currently is. This means the push target doesn't reach far enough into the log to actually move the tail before the transaction reservation goes to sleep. When the CIL push completes, it moves the log head forward such that the AIL push target can now be moved, but that has no mechanism for puhsing the log tail. Further, if the next tail movement of the log is not large enough wake the waiter (i.e. still not enough space for it to have a reservation granted), we don't wake anything up, and hence we do not update the AIL push target to take into account the head of the log moving and allowing the push target to be moved forwards. To avoid this particular condition, if we fail to wake the first waiter on the grant head because we don't have enough space, push on the AIL again. This will pick up any movement of the log head and allow the push target to move forward due to completion of CIL pushing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
f8f9ee47 |
|
26-Aug-2019 |
Dave Chinner <dchinner@redhat.com> |
xfs: add kmem_alloc_io() Memory we use to submit for IO needs strict alignment to the underlying driver contraints. Worst case, this is 512 bytes. Given that all allocations for IO are always a power of 2 multiple of 512 bytes, the kernel heap provides natural alignment for objects of these sizes and that suffices. Until, of course, memory debugging of some kind is turned on (e.g. red zones, poisoning, KASAN) and then the alignment of the heap objects is thrown out the window. Then we get weird IO errors and data corruption problems because drivers don't validate alignment and do the wrong thing when passed unaligned memory buffers in bios. TO fix this, introduce kmem_alloc_io(), which will guaranteeat least 512 byte alignment of buffers for IO, even if memory debugging options are turned on. It is assumed that the minimum allocation size will be 512 bytes, and that sizes will be power of 2 mulitples of 512 bytes. Use this everywhere we allocate buffers for IO. This no longer fails with log recovery errors when KASAN is enabled due to the brd driver not handling unaligned memory buffers: # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
707e0dda |
|
26-Aug-2019 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
fs: xfs: Remove KM_NOSLEEP and KM_SLEEP. Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP, we can remove KM_NOSLEEP and replace KM_SLEEP with 0. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
294fc7a4 |
|
01-Aug-2019 |
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> |
fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve(). When the system is close-to-OOM, fsync() may fail due to -ENOMEM because xfs_log_reserve() is using KM_MAYFAIL. It is a bad thing to fail writeback operation due to user-triggerable OOM condition. Since we are not using KM_MAYFAIL at xfs_trans_alloc() before calling xfs_log_reserve(), let's use the same flags at xfs_log_reserve(). oom-torture: page allocation failure: order:0, mode:0x46c40(GFP_NOFS|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_COMP), nodemask=(null) CPU: 7 PID: 1662 Comm: oom-torture Kdump: loaded Not tainted 5.3.0-rc2+ #925 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 Call Trace: dump_stack+0x67/0x95 warn_alloc+0xa9/0x140 __alloc_pages_slowpath+0x9a8/0xbce __alloc_pages_nodemask+0x372/0x3b0 alloc_slab_page+0x3a/0x8d0 new_slab+0x330/0x420 ___slab_alloc.constprop.94+0x879/0xb00 __slab_alloc.isra.89.constprop.93+0x43/0x6f kmem_cache_alloc+0x331/0x390 kmem_zone_alloc+0x9f/0x110 [xfs] kmem_zone_alloc+0x9f/0x110 [xfs] xlog_ticket_alloc+0x33/0xd0 [xfs] xfs_log_reserve+0xb4/0x410 [xfs] xfs_trans_reserve+0x1d1/0x2b0 [xfs] xfs_trans_alloc+0xc9/0x250 [xfs] xfs_setfilesize_trans_alloc.isra.27+0x44/0xc0 [xfs] xfs_submit_ioend.isra.28+0xa5/0x180 [xfs] xfs_vm_writepages+0x76/0xa0 [xfs] do_writepages+0x17/0x80 __filemap_fdatawrite_range+0xc1/0xf0 file_write_and_wait_range+0x53/0xa0 xfs_file_fsync+0x87/0x290 [xfs] vfs_fsync_range+0x37/0x80 do_fsync+0x38/0x60 __x64_sys_fsync+0xf/0x20 do_syscall_64+0x4a/0x1c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: eb01c9cd87 ("[XFS] Remove the xlog_ticket allocator") Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a7a9250e |
|
03-Jul-2019 |
Hariprasad Kelam <hariprasad.kelam@gmail.com> |
fs: xfs: xfs_log: Change return type from int to void Change return types of below functions as they never fails xfs_log_mount_cancel xlog_recover_cancel xlog_recover_cancel_intents fix below issue reported by coccicheck fs/xfs/xfs_log_recover.c:4886:7-12: Unneeded variable: "error". Return "0" on line 4926 Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
89b171ac |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: fix iclog allocation size Properly allocate the space for the bio_vecs instead of just one byte per bio_vec. Fixes: 79b54d9bfcdcd0a ("xfs: use bios directly to write log buffers") Reported-by: syzbot+b75afdbe271a0d7ac4f6@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
250d4b4c |
|
28-Jun-2019 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: remove unused header files There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
89ae379d |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: use a list_head for iclog callbacks Replace the hand grown linked list handling and cil context attachment with the standard list_head structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d15cbf2f |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: stop using XFS_LI_ABORTED as a parameter flag Just pass a straight bool aborted instead of abusing XFS_LI_ABORTED as a flag in function parameters. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1058d0f5 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: move the log ioend workqueue to struct xlog Move the workqueue used for log I/O completions from struct xfs_mount to struct xlog to keep it self contained in the log code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: destroy the log workqueue after ensuring log ios are done] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
79b54d9b |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: use bios directly to write log buffers Currently the XFS logging code uses the xfs_buf structure and associated APIs to write the log buffers to disk. This requires various special cases in the log code and is generally not very optimal. Instead of using a buffer just allocate a kmem_alloc_larger region for each log buffer, and use a bio and bio_vec array embedded in the iclog structure to write the buffer to disk. This also allows for using the bio split and chaining case to deal with the case of a log buffer wrapping around the end of the log. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: don't split if/else with an #endif] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2d15d2c0 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: make use of the l_targ field in struct xlog Use the slightly shorter way to get at the buftarg for the log device wherever we can in the log and log recovery code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
abca1f33 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the syncing argument from xlog_verify_iclog The only caller unconditionally passes true here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9b0489c1 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: update both stat counters together in xlog_sync Just a small bit of code tidying up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
db0a6faf |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: factor out iclog size calculation from xlog_sync Split out another self-contained bit of code from xlog_sync. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
56933848 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: factor out splitting of an iclog from xlog_sync Split out a self-contained chunk of code from xlog_sync that calculates the split offset for an iclog that wraps the log end and bumps the cycles for the second half. Use the chance to bring some sanity to the variables used to track the split in xlog_sync by not changing the count variable, and instead use split as the offset for the split and use those to calculate the sizes and offsets for the two write buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
94860a30 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: factor out log buffer writing from xlog_sync Replace the not very useful xlog_bdstrat wrapper with a new version that that takes care of all the common logic for writing log buffers. Use the opportunity to avoid overloading the buffer address with the log relative address, and to shed the unused return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1f9489be |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: don't use REQ_PREFLUSH for split log writes If we have to split a log write because it wraps the end of the log we can't just use REQ_PREFLUSH to flush before the first log write, as the writes might get reordered somewhere in the I/O stack. Issue a manual flush in that case so that the ordering of the two log I/Os doesn't matter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
366fc4b8 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove XLOG_STATE_IOABORT This value is the only flag in ic_state, which we otherwise use as a state. Switch it to a new debug-only field and also report and actual error in the buffer in the I/O completion path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9bff3132 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: reformat xlog_get_lowest_lsn Reformat xlog_get_lowest_lsn to our usual style. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
4f62282a |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup xlog_get_iclog_buffer_size We don't really need all the messy branches in the function, as it really does three things, out of which 2 are common for all branches: 1) set up mount point log buffer size and count values if not already done from mount options 2) calculate the number of log headers 3) set up all the values in struct xlog based on the above Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
76ce9823 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the l_iclog_size_log field from struct xlog This field is never used, so we can simply kill it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1e85a367 |
|
28-Jun-2019 |
Christoph Hellwig <hch@lst.de> |
xfs: remove the no-op spinlock_destroy stub Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
d31d7185 |
|
23-May-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix broken log reservation debugging xlog_print_tic_res() is supposed to print a human readable string for each element of the log ticket reservation array. Unfortunately, I forgot to update the string array when we added rmap & reflink support, so the debug message prints "region[3]: (null) - 352 bytes" which isn't useful at all. Add the missing elements and add a build check so that we don't forget again to add a string when adding a new XLOG_REG_TYPE. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
39353ff6 |
|
12-Apr-2019 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: replace the BAD_SUMMARY mount flag with the equivalent health code Replace the BAD_SUMMARY mount flag with calls to the equivalent health tracking code. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
a0e336ba |
|
03-Aug-2018 |
Huang Chong <huang.chong@zte.com.cn> |
xfs: fix a comment in xfs_log_reserve Fix the comment in xfs_log_reserve to avoid confusing. Signed-of-by: Huang Chong <huang.chong@zte.com.cn> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0c60d3aa |
|
01-Aug-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor log recovery check Add a predicate to decide if the log is actively in recovery and use that instead of open-coding a pagf_init check in the attr leaf verifier. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
f467cad9 |
|
20-Jul-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: force summary counter recalc at next mount Use the "bad summary count" mount flag from the previous patch to skip writing the unmount record to force log recovery at the next mount, which will recalculate the summary counters for us. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
53235f22 |
|
20-Jul-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: refactor unmount record write Refactor the writing of the unmount record into a separate helper. No functionality changes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
9bb54cb5 |
|
07-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: clean up MIN/MAX Get rid of the MIN/MAX macros and just use the native min/max macros directly in the XFS code. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
0b61f8a4 |
|
05-Jun-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert to SPDX license tags Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e6631f85 |
|
09-May-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: get rid of the log item descriptor It's just a connector between a transaction and a log item. There's a 1:1 relationship between a log item descriptor and a log item, and a 1:1 relationship between a log item descriptor and a transaction. Both relationships are created and terminated at the same time, so why do we even have the descriptor? Replace it with a specific list_head in the log item and a new log item dirtied flag to replace the XFS_LID_DIRTY flag. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [darrick: fix up deferred agfl intent finish_item use of LID_DIRTY] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
22525c17 |
|
09-May-2018 |
Dave Chinner <dchinner@redhat.com> |
xfs: log item flags are racy The log item flags contain a field that is protected by the AIL lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to set and clear these flags, but most of the updates and checks are not done with the AIL lock held and so are susceptible to update races. Fix this by changing the log item flags to use atomic bitops rather than be reliant on the AIL lock for update serialisation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a1f69417 |
|
06-Apr-2018 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: non-scrub - remove unused function parameters Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3e4da466 |
|
14-Mar-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: unwind the try_again loop in xfs_log_force Instead split out a __xfs_log_fore_lsn helper that gets called again with the already_slept flag set to true in case we had to sleep. This prepares for aio_fsync support. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
93806299 |
|
14-Mar-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_log_force_lsn Use the the smallest possible loop as preable to find the correct iclog buffer, and then use gotos for unwinding to straighten the code. Also fix the top of function comment while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e6b96570 |
|
14-Mar-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: refactor xfs_log_force Streamline the conditionals so that it is more obvious which specific case form the top of the function comments is being handled. Use gotos only for early returns. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
656de4ff |
|
14-Mar-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: merge _xfs_log_force_lsn and xfs_log_force_lsn Switch to a single interface for flushing the log to a specific LSN, which gives consistent trace point coverage and a less confusing interface. The was only a single user of the previous xfs_log_force_lsn function, which now also passes a NULL log_flushed argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
60e5bb78 |
|
14-Mar-2018 |
Christoph Hellwig <hch@lst.de> |
xfs: merge _xfs_log_force and xfs_log_force Switch to a single interface for flushing the whole log, which gives consistent trace point coverage, and removes the unused log_flushed argument for the previous _xfs_log_force callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
57e80956 |
|
07-Mar-2018 |
Matthew Wilcox <willy@infradead.org> |
xfs: Rename xa_ elements to ail_ This is a simple rename, except that xa_ail becomes ail_head. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
643c8c05 |
|
24-Jan-2018 |
Carlos Maiolino <cmaiolino@redhat.com> |
Use list_head infra-structure for buffer's log items list Now that buffer's b_fspriv has been split, just replace the current singly linked list of xfs_log_items, by the list_head infrastructure. Also, remove the xfs_log_item argument from xfs_buf_resubmit_failed_buffers(), there is no need for this argument, once the log items can be walked through the list_head in the buffer. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style cleanups] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
fb1755a6 |
|
24-Jan-2018 |
Carlos Maiolino <cmaiolino@redhat.com> |
Split buffer's b_fspriv field By splitting the b_fspriv field into two different fields (b_log_item and b_li_list). It's possible to get rid of an old ABI workaround, by using the new b_log_item field to store xfs_buf_log_item separated from the log items attached to the buffer, which will be linked in the new b_li_list field. This way, there is no more need to reorder the log items list to place the buf_log_item at the beginning of the list, simplifying a bit the logic to handle buffer IO. This also opens the possibility to change buffer's log items list into a proper list_head. b_log_item field is still defined as a void *, because it is still used by the log buffers to store xlog_in_core structures, and there is no need to add an extra field on xfs_buf just for xlog_in_core. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: minor style changes] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
c9690043 |
|
09-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: use %px for data pointers when debugging Starting with commit 57e734423ad ("vsprintf: refactor %pK code out of pointer"), the behavior of the raw '%p' printk format specifier was changed to print a 32-bit hash of the pointer value to avoid leaking kernel pointers into dmesg. For most situations that's good. This is /undesirable/ behavior when we're trying to debug XFS, however, so define a PTR_FMT that prints the actual pointer when we're in debug mode. Note that %p for tracepoints still prints the raw pointer, so in the long run we could consider rewriting some of these messages as tracepoints. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
3d170aa2 |
|
09-Jan-2018 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: change 0x%p -> %p in print messages Since %p prepends "0x" to the outputted string, we can drop the prefix. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
2c8f6265 |
|
08-Jan-2018 |
Brian Foster <bfoster@redhat.com> |
xfs: print transaction log reservation on overrun The transaction dump code displays the content and reservation consumption of a particular transaction in the event of an overrun. It currently displays the reservation associated with the transaction ticket, but not the original reservation attached to the transaction. The latter value reflects the original transaction reservation calculation before additional reservation overhead is assigned, such as for the CIL context header and potential split region headers. Update xlog_print_trans() to also print the original transaction reservation in the event of overrun. This provides a reference point to identify how much reservation overhead was added to a particular ticket by xfs_log_calc_unit_res(). Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
1751e8a6 |
|
27-Nov-2017 |
Linus Torvalds <torvalds@linux-foundation.org> |
Rename superblock flags (MS_xyz -> SB_xyz) This is a pure automated search-and-replace of the internal kernel superblock flags. The s_flags are now called SB_*, with the names and the values for the moment mirroring the MS_* flags that they're equivalent to. Note how the MS_xyz flags are the ones passed to the mount system call, while the SB_xyz flags are what we then use in sb->s_flags. The script to do this was: # places to look in; re security/*: it generally should *not* be # touched (that stuff parses mount(2) arguments directly), but # there are two places where we really deal with superblock flags. FILES="drivers/mtd drivers/staging/lustre fs ipc mm \ include/linux/fs.h include/uapi/linux/bfs_fs.h \ security/apparmor/apparmorfs.c security/apparmor/include/lib.h" # the list of MS_... constants SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \ DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \ POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \ I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \ ACTIVE NOUSER" SED_PROG= for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done # we want files that contain at least one of MS_..., # with fs/namespace.c and fs/pnode.c excluded. L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c') for f in $L; do sed -i $f $SED_PROG; done Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
181fdfe6 |
|
06-Nov-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: mark xlog_verify_dest_ptr STATIC We already did it in the forward declaration, but not for the function body itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e9e899a2 |
|
31-Oct-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: move error injection tags into their own file Move the error injection tag names into a libxfs header so that we can share it between kernel and userspace. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
9c92ee20 |
|
25-Oct-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: validate sb_logsunit is a multiple of the fs blocksize Make sure the log stripe unit is sane before proceeding with mounting. AFAICT this means that logsunit has to be 0, 1, or a multiple of the fs block size. Found this by setting the LSB of logsunit in xfs/350 and watching the system crash as soon as we try to write to the log. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
f1b92bbc |
|
26-Oct-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: drain the buffer LRU on mount Log recovery of v4 filesystems does not use buffer verifiers because log recovery historically can result in transient buffer corruption when target buffers might be ahead of the log after a crash. v5 filesystems work around this problem with metadata LSN ordering. While this log recovery verifier behavior is necessary on v4 supers, it can result in leaving buffers around in the LRU without verifiers attached for a significant amount of time. This leads to use of unverified buffers while the filesystem is in active use, long after recovery has completed. To address this problem, drain all buffers from the LRU as a final step of the log mount sequence. Note that this is done unconditionally to provide a consistently clean cache footprint, regardless of superblock version or log state. As a side effect, this ensures that all cache resident, unverified buffers are reclaimed after log recovery and therefore must be recreated with verifiers on subsequent use. Reported-by: Darrick Wong <darrick.wong@oracle.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
749f24f3 |
|
09-Oct-2017 |
Thomas Meyer <thomas@m3y3r.de> |
xfs: Fix bool initialization/comparison Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
47c7d0b1 |
|
30-Aug-2017 |
Amir Goldstein <amir73il@gmail.com> |
xfs: fix incorrect log_flushed on fsync When calling into _xfs_log_force{,_lsn}() with a pointer to log_flushed variable, log_flushed will be set to 1 if: 1. xlog_sync() is called to flush the active log buffer AND/OR 2. xlog_wait() is called to wait on a syncing log buffers xfs_file_fsync() checks the value of log_flushed after _xfs_log_force_lsn() call to optimize away an explicit PREFLUSH request to the data block device after writing out all the file's pages to disk. This optimization is incorrect in the following sequence of events: Task A Task B ------------------------------------------------------- xfs_file_fsync() _xfs_log_force_lsn() xlog_sync() [submit PREFLUSH] xfs_file_fsync() file_write_and_wait_range() [submit WRITE X] [endio WRITE X] _xfs_log_force_lsn() xlog_wait() [endio PREFLUSH] The write X is not guarantied to be on persistent storage when PREFLUSH request in completed, because write A was submitted after the PREFLUSH request, but xfs_file_fsync() of task A will be notified of log_flushed=1 and will skip explicit flush. If the system crashes after fsync of task A, write X may not be present on disk after reboot. This bug was discovered and demonstrated using Josef Bacik's dm-log-writes target, which can be used to record block io operations and then replay a subset of these operations onto the target device. The test goes something like this: - Use fsx to execute ops of a file and record ops on log device - Every now and then fsync the file, store md5 of file and mark the location in the log - Then replay log onto device for each mark, mount fs and compare md5 of file to stored value Cc: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <jbacik@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
799ea9e9 |
|
18-Aug-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: evict all inodes involved with log redo item When we introduced the bmap redo log items, we set MS_ACTIVE on the mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes from being truncated prematurely during log recovery. This also had the effect of putting linked inodes on the lru instead of evicting them. Unfortunately, we neglected to find all those unreferenced lru inodes and evict them after finishing log recovery, which means that we leak them if anything goes wrong in the rest of xfs_mountfs, because the lru is only cleaned out on unmount. Therefore, evict unreferenced inodes in the lru list immediately after clearing MS_ACTIVE. Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: viro@ZenIV.linux.org.uk Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
6f4a1eef |
|
08-Aug-2017 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: toggle readonly state around xfs_log_mount_finish When we do log recovery on a readonly mount, unlinked inode processing does not happen due to the readonly checks in xfs_inactive(), which are trying to prevent any I/O on a readonly mount. This is misguided - we do I/O on readonly mounts all the time, for consistency; for example, log recovery. So do the same RDONLY flag twiddling around xfs_log_mount_finish() as we do around xfs_log_mount(), for the same reason. This all cries out for a big rework but for now this is a simple fix to an obvious problem. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
757a69ef |
|
08-Aug-2017 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: write unmount record for ro mounts There are dueling comments in the xfs code about intent for log writes when unmounting a readonly filesystem. In xfs_mountfs, we see the intent: /* * Now the log is fully replayed, we can transition to full read-only * mode for read-only mounts. This will sync all the metadata and clean * the log so that the recovery we just performed does not have to be * replayed again on the next mount. */ and it calls xfs_quiesce_attr(), but by the time we get to xfs_log_unmount_write(), it returns early for a RDONLY mount: * Don't write out unmount record on read-only mounts. Because of this, sequential ro mounts of a filesystem with a dirty log will replay the log each time, which seems odd. Fix this by writing an unmount record even for RO mounts, as long as norecovery wasn't specified (don't write a clean log record if a dirty log may still be there!) and the log device is writable. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
8204f8dd |
|
10-Aug-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: clear MS_ACTIVE after finishing log recovery Way back when we established inode block-map redo log items, it was discovered that we needed to prevent the VFS from evicting inodes during log recovery because any given inode might be have bmap redo items to replay even if the inode has no link count and is ultimately deleted, and any eviction of an unlinked inode causes the inode to be truncated and freed too early. To make this possible, we set MS_ACTIVE so that inodes would not be torn down immediately upon release. Unfortunately, this also results in the quota inodes not being released at all if a later part of the mount process should fail, because we never reclaim the inodes. So, set MS_ACTIVE right before we do the last part of log recovery and clear it immediately after we finish the log recovery so that everything will be torn down properly if we abort the mount. Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com>
|
#
244e3dea |
|
26-Jun-2017 |
kbuild test robot <fengguang.wu@intel.com> |
xfs: fix semicolon.cocci warnings fs/xfs/xfs_log.c:2092:38-39: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: d4ca1d550d05 ("xfs: dump transaction usage details on log reservation overrun") CC: Brian Foster <bfoster@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
3e88a007 |
|
27-Jun-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: replace log_badcrc_factor knob with error injection tag Now that error injection tags support dynamic frequency adjustment, replace the debug mode sysfs knob that controls log record CRC error injection with an error injection tag. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
9e24cfd0 |
|
20-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove unneeded parameter from XFS_TEST_ERROR Since we moved the injected error frequency controls to the mountpoint, we can get rid of the last argument to XFS_TEST_ERROR. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
c8ce540d |
|
16-Jun-2017 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: remove double-underscore integer types This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
d4ca1d55 |
|
14-Jun-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: dump transaction usage details on log reservation overrun If a transaction log reservation overrun occurs, the ticket data associated with the reservation is dumped in xfs_log_commit_cil(). This occurs long after the transaction items and details have been removed from the transaction and effectively lost. This limited set of ticket data provides very little information to support debugging transaction overruns based on the typical report. To improve transaction log reservation overrun reporting, create a helper to dump transaction details such as log items, log vector data, etc., as well as the underlying ticket data for the transaction. Move the overrun detection from xfs_log_commit_cil() to xlog_cil_insert_items() so it occurs prior to migration of the logged items to the CIL. Call the new helper such that it is able to dump this transaction data before it is lost. Also, warn on overrun to provide callstack context for the offending transaction and include a few additional messages from xlog_cil_insert_items() to display the reservation consumed locally for overhead such as log vector headers, split region headers and the context ticket. This provides a complete general breakdown of the reservation consumption of a transaction when/if it happens to overrun the reservation. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
7d2d5653 |
|
14-Jun-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: separate shutdown from ticket reservation print helper xlog_print_tic_res() pre-dates delayed logging and the committed items list (CIL) and thus retains some factoring warts, such as hard coded function names in the output and the fact that it induces a shutdown. In preparation for more detailed logging of regular transaction overrun situations, refactor xlog_print_tic_res() to be slightly more generic. Reword some of the warning messages and pull the shutdown into the callers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e2a64192 |
|
21-Apr-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: corruption needs to respect endianess too! At least if we want to be able to recognize the pattern. Add a missing byte swap to the corruption injection case in xlog_sync. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
696a5620 |
|
28-Mar-2017 |
Brian Foster <bfoster@redhat.com> |
xfs: use dedicated log worker wq to avoid deadlock with cil wq The log covering background task used to be part of the xfssyncd workqueue. That workqueue was removed as of commit 5889608df ("xfs: syncd workqueue is no more") and the associated work item scheduled to the xfs-log wq. The latter is used for log buffer I/O completion. Since xfs_log_worker() can invoke a log flush, a deadlock is possible between the xfs-log and xfs-cil workqueues. Consider the following codepath from xfs_log_worker(): xfs_log_worker() xfs_log_force() _xfs_log_force() xlog_cil_force() xlog_cil_force_lsn() xlog_cil_push_now() flush_work() The above is in xfs-log wq context and blocked waiting on the completion of an xfs-cil work item. Concurrently, the cil push in progress can end up blocked here: xlog_cil_push_work() xlog_cil_push() xlog_write() xlog_state_get_iclog_space() xlog_wait(&log->l_flush_wait, ...) The above is in xfs-cil context waiting on log buffer I/O completion, which executes in xfs-log wq context. In this scenario both workqueues are deadlocked waiting on eachother. Add a new workqueue specifically for the high level log covering and ail pushing worker, as was the case prior to commit 5889608df. Diagnosed-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
84a4620c |
|
09-Jan-2017 |
Christoph Hellwig <hch@lst.de> |
xfs: don't print warnings when xfs_log_force fails There are only two reasons for xfs_log_force / xfs_log_force_lsn to fail: one is an I/O error, for which xlog_bdstrat already logs a warning, and the second is an already shutdown log due to a previous I/O errors. In the latter case we'll already have a previous indication for the actual error, but the large stream of misleading warnings from xfs_log_force will probably scroll it out of the message buffer. Simply removing the warnings thus makes the XFS log reporting significantly better. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
2291dab2 |
|
08-Dec-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: Always flush caches when integrity is required There is no reason anymore for not issuing device integrity operations when teh filesystem requires ordering or data integrity guarantees. We should always issue cache flushes and FUA writes where necessary and let the underlying storage optimise them as necessary for correct integrity operation. Signed-Off-By: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
cae028df |
|
04-Dec-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: optimise CRC updates Nick Piggin reported that the CRC overhead in an fsync heavy workload was higher than expected on a Power8 machine. Part of this was to do with the fact that the power8 CRC implementation is not efficient for CRC lengths of less than 512 bytes, and so the way we split the CRCs over the CRC field means a lot of the CRCs are reduced to being less than than optimal size. To optimise this, change the CRC update mechanism to zero the CRC field first, and then compute the CRC in one pass over the buffer and write the result back into the buffer. We can do this safely because anything writing a CRC has exclusive access to the buffer the CRC is being calculated over. We leave the CRC verify code the same - it still splits the CRC calculation - because we do not want read-only operations modifying the underlying buffer. This is because read-only operations may not have an exclusive access to the buffer guaranteed, and so temporary modifications could leak out to to other processes accessing the buffer concurrently. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c891c30a |
|
19-Jul-2016 |
Brian Foster <bfoster@redhat.com> |
xfs: exclude never-released buffers from buftarg I/O accounting The upcoming buftarg I/O accounting mechanism maintains a count of all buffers that have undergone I/O in the current hold-release cycle. Certain buffers associated with core infrastructure (e.g., the xfs_mount superblock buffer, log buffers) are never released, however. This means that accounting I/O submission on such buffers elevates the buftarg count indefinitely and could lead to lockup on unmount. Define a new buffer flag to explicitly exclude buffers from buftarg I/O accounting. Set the flag on the superblock and associated log buffers. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
0d5a75e9 |
|
01-Jun-2016 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: make several functions static Al Viro noticed that xfs_lock_inodes should be static, and that led to ... a few more. These are just the easy ones, others require moving functions higher in source files, so that's not done here to keep this review simple. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
9f27889f |
|
05-Apr-2016 |
Carlos Maiolino <cmaiolino@redhat.com> |
xfs: Add caller function output to xfs_log_force tracepoint I had sent this patch yesterday, but for some reason it didn't reach xfs list, sending again. Output the caller of xfs_log_force might be useful when tracing log checkpoint problems without the need to build kernel with DEBUG. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
710b1e2c |
|
05-Apr-2016 |
Christoph Hellwig <hch@lst.de> |
xfs: remove transaction types These aren't used for CIL-style logging and can be dropped. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
5110cd82 |
|
06-Mar-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: use named array initializers for log item dumping Use named array initializers for the string arrays used to dump log items, rather than depending on the order being maintained correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
12877da5 |
|
09-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove XFS_BUF_ZEROFLAGS macro The places where we use this macro already clear unnecessary IO flags (e.g. through xfs_bwrite()) or never have unexpected IO flags set on them in the first place (e.g. iclog buffers). Remove the macro from these locations, and where necessary clear only the specific flags that are conditional in the current buffer context. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b68c0821 |
|
09-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove XBF_WRITE flag wrapper macros They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
1157b32c |
|
09-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove XBF_ASYNC flag wrapper macros They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b0388bf1 |
|
09-Feb-2016 |
Dave Chinner <dchinner@redhat.com> |
xfs: remove XBF_DONE flag wrapper macros They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
609adfc2 |
|
04-Jan-2016 |
Brian Foster <bfoster@redhat.com> |
xfs: debug mode log record crc error injection XFS now uses CRC verification over a limited section of the log to detect torn writes prior to a crash. This is difficult to test directly due to the timing and hardware requirements to cause a short write. Add a mechanism to inject CRC errors into log records to facilitate testing torn write detection during log recovery. This mechanism is dangerous and can result in filesystem corruption. Thus, it is only available in DEBUG mode for testing/development purposes. Set a non-zero value to the following sysfs entry to enable error injection: /sys/fs/xfs/<dev>/log/log_badcrc_factor Once enabled, XFS intentionally writes an invalid CRC to a log record at some random point in the future based on the provided frequency. The filesystem immediately shuts down once the record has been written to the physical log to prevent metadata writeback (e.g., AIL insertion) once the log write completes. This helps reasonably simulate a torn write to the log as the affected record must be safe to discard. The next mount after the intentional shutdown requires log recovery and should detect and recover from the torn write. Note again that this _will_ result in data loss or worse. For testing and development purposes only! Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
9b434a34 |
|
03-Jan-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
xfs: fix log ticket type printing Update the log ticket reservation type printing code to reflect all the types of log tickets, to avoid incorrect debug output and avoid running off the end of the array. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ff6d6af2 |
|
12-Oct-2015 |
Bill O'Donnell <billodo@redhat.com> |
xfs: per-filesystem stats counter implementation This patch modifies the stats counting macros and the callers to those macros to properly increment, decrement, and add-to the xfs stats counts. The counts for global and per-fs stats are correctly advanced, and cleared by writing a "1" to the corresponding clear file. global counts: /sys/fs/xfs/stats/stats per-fs counts: /sys/fs/xfs/sda*/stats/stats global clear: /sys/fs/xfs/stats/stats_clear per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around stats structures and macros. ] Signed-off-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
91f9f5fe |
|
11-Oct-2015 |
Eric Sandeen <sandeen@redhat.com> |
xfs: avoid null *src in memcpy call in xlog_write The gcc undefined behavior sanitizer caught this; surely any sane memcpy implementation will no-op if size == 0, but behavior with a *src of NULL is technically undefined (declared nonnull), so avoid it here. We are actually in this situation frequently via xlog_commit_record(), because: struct xfs_log_iovec reg = { .i_addr = NULL, .i_len = 0, .i_type = XLOG_REG_TYPE_COMMIT, }; Reported-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a45086e2 |
|
11-Oct-2015 |
Brian Foster <bfoster@redhat.com> |
xfs: validate metadata LSNs against log on v5 superblocks Since the onset of v5 superblocks, the LSN of the last modification has been included in a variety of on-disk data structures. This LSN is used to provide log recovery ordering guarantees (e.g., to ensure an older log recovery item is not replayed over a newer target data structure). While this works correctly from the point a filesystem is formatted and mounted, userspace tools have some problematic behaviors that defeat this mechanism. For example, xfs_repair historically zeroes out the log unconditionally (regardless of whether corruption is detected). If this occurs, the LSN of the filesystem is reset and the log is now in a problematic state with respect to on-disk metadata structures that might have a larger LSN. Until either the log catches up to the highest previously used metadata LSN or each affected data structure is modified and written out without incident (which resets the metadata LSN), log recovery is susceptible to filesystem corruption. This problem is ultimately addressed and repaired in the associated userspace tools. The kernel is still responsible to detect the problem and notify the user that something is wrong. Check the superblock LSN at mount time and fail the mount if it is invalid. From that point on, trigger verifier failure on any metadata I/O where an invalid LSN is detected. This results in a filesystem shutdown and guarantees that we do not log metadata changes with invalid LSNs on disk. Since this is a known issue with a known recovery path, present a warning to instruct the user how to recover. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a3f20014 |
|
18-Aug-2015 |
Brian Foster <bfoster@redhat.com> |
xfs: checksum log record ext headers based on record size The first 4 bytes of every basic block in the physical log is stamped with the current lsn. To support this mechanism, the log record header (first block of each new log record) contains space for the original first byte of each log record block before it is replaced with the lsn. The log record header has space for 32k worth of blocks. The version 2 log adds new extended record headers for each additional 32k worth of blocks beyond what is supported by the record header. The log record checksum incorporates the log record header, the extended headers and the record payload. xlog_cksum() checksums the extended headers based on log->l_iclog_heads, which specifies the number of extended headers in a log record based on the log buffer size mount option. The log buffer size is variable, however, and thus means the checksum can be calculated differently based on how a filesystem is mounted. This is problematic if a filesystem crashes and recovery occurs on a subsequent mount using a different log buffer size. For example, crash an active filesystem that is mounted with the default (32k) logbsize, attempt remount/recovery using '-o logbsize=64k' and the mount fails on or warns about log checksum failures. To avoid this problem, update xlog_cksum() to calculate the checksum based on the size of the log buffer according to the log record. The size is already included in the h_size field of the log record header and thus is available at log recovery time. Extended log record headers are also only written when the log record is large enough to require them. This makes checksum calculation of log records consistent with the extended record header mechanism as well as how on-disk records are checksummed with various log buffer size mount options. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f0b2efad |
|
18-Aug-2015 |
Brian Foster <bfoster@redhat.com> |
xfs: don't leave EFIs on AIL on mount failure Log recovery occurs in two phases at mount time. In the first phase, EFIs and EFDs are processed and potentially cancelled out. EFIs without EFD objects are inserted into the AIL for processing and recovery in the second phase. xfs_mountfs() runs various other operations between the phases and is thus subject to failure. If failure occurs after the first phase but before the second, pending EFIs sit on the AIL, pin it and cause the mount to hang. Update the mount sequence to ensure that pending EFIs are cancelled in the event of failure. Add a recovery cancellation mechanism to iterate the AIL and cancel all EFI items when requested. Plumb cancellation support through the log mount finish helper and update xfs_mountfs() to invoke cancellation in the event of failure after recovery has started. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f41febd2 |
|
28-Jul-2015 |
Joe Perches <joe@perches.com> |
xfs: Use consistent logging message prefixes The second and subsequent lines of multi-line logging messages are not prefixed with the same information as the first line. Separate messages with newlines into multiple calls to ensure consistent prefixing and allow easier grep use. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b2a922cd |
|
21-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: remove xfs_caddr_t Just use char pointers directly instead of the confusing typedef to a pointer type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
5809d5e0 |
|
21-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: use void pointers in log validation helpers Compared to char pointers this saves us a lot of casting effort. Also add another local variable to make the code easier to read. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
db9d67d6 |
|
21-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: remove __psint_t and __psunsigned_t Replace uses of __psint_t with the proper uintptr_t and ptrdiff_t types, and remove the defintions of __psint_t and __psunsigned_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
f78c3901 |
|
03-Jun-2015 |
Christoph Hellwig <hch@lst.de> |
xfs: fix xfs_log_done interface Instead of the confusing flags argument pass a boolean flag to indicate if we want to release or regrant a log reservation. Also ensure that xfs_log_done always drop the reference on the log ticket, to both simplify the code and make the logic in xfs_trans_roll easier to understand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
61e63ecb |
|
21-Jan-2015 |
Dave Chinner <dchinner@redhat.com> |
xfs: consolidate superblock logging functions We now have several superblock loggin functions that are identical except for the transaction reservation and whether it shoul dbe a synchronous transaction or not. Consolidate these all into a single function, a single reserveration and a sync flag and call it xfs_sync_sb(). Also, xfs_mod_sb() is not really a modification function - it's the operation of logging the superblock buffer. hence change the name of it to reflect this. Note that we have to change the mp->m_update_flags that are passed around at mount time to a boolean simply to indicate a superblock update is needed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
77af574e |
|
23-Dec-2014 |
Eric Sandeen <sandeen@redhat.com> |
xfs: remove extra newlines from xfs messages xfs_warn() and friends add a newline by default, but some messages add another one. Particularly for the failing write message below, this can waste a lot of console real estate! Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
96ab7954 |
|
23-Dec-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: initialize log buf I/O completion wq on log alloc Log buffer I/O completion passes through the high priority m_log_workqueue rather than the default metadata buffer workqueue. The log buffer wq is initialized at I/O submission time. The log buffers are reused once initialized, however, so this is not necessary. Initialize the log buffer I/O completion workqueue pointers once when the log is allocated and log buffers initialized rather than on every log buffer I/O submission. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b29c70f5 |
|
03-Dec-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: split metadata and log buffer completion to separate workqueues XFS traditionally sends all buffer I/O completion work to a single workqueue. This includes metadata buffer completion and log buffer completion. The log buffer completion requires a high priority queue to prevent stalls due to log forces getting stuck behind other queued work. Rather than continue to prioritize all buffer I/O completion due to the needs of log completion, split log buffer completion off to m_log_workqueue and move the high priority flag from m_buf_workqueue to m_log_workqueue. Add a b_ioend_wq wq pointer to xfs_buf to allow completion workqueue customization on a per-buffer basis. Initialize b_ioend_wq to m_buf_workqueue by default in the generic buffer I/O submission path. Finally, override the default wq with the high priority m_log_workqueue in the log buffer I/O submission path. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
bb58e618 |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: move most of xfs_sb.h to xfs_format.h More on-disk format consolidation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
4fb6e8ad |
|
27-Nov-2014 |
Christoph Hellwig <hch@lst.de> |
xfs: merge xfs_ag.h into xfs_format.h More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
91ee575f |
|
27-Nov-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: allow lazy sb counter sync during filesystem freeze sequence The expectation since the introduction the lazy superblock counters is that the counters are synced and superblock logged appropriately as part of the filesystem freeze sequence. This does not occur, however, due to the logic in xfs_fs_writable() that prevents progress when the fs is in any state other than SB_UNFROZEN. While this is a bug, it has not been exposed to date because the last thing XFS does during freeze is dirty the log. The log recovery process recalculates the counters from AGI/AGF metadata to ensure everything is correct. Therefore should a crash occur while an fs is frozen, the subsequent log recovery puts everything back in order. See the following commit for reference: 92821e2b [XFS] Lazy Superblock Counters We might not always want to rely on dirtying the log on a frozen fs. Modify xfs_log_sbcount() to proceed when the filesystem is freezing but not once the freeze process has completed. Modify xfs_fs_writable() to accept the minimum freeze level for which modifications should be blocked to support various codepaths. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
595bff75 |
|
01-Oct-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce xfs_buf_submit[_wait] There is a lot of cookie-cutter code that looks like: if (shutdown) handle buffer error xfs_buf_iorequest(bp) error = xfs_buf_iowait(bp) if (error) handle buffer error spread through XFS. There's significant complexity now in xfs_buf_iorequest() to specifically handle this sort of synchronous IO pattern, but there's all sorts of nasty surprises in different error handling code dependent on who owns the buffer references and the locks. Pull this pattern into a single helper, where we can hide all the synchronous IO warts and hence make the error handling for all the callers much saner. This removes the need for a special extra reference to protect IO completion processing, as we can now hold a single reference across dispatch and waiting, simplifying the sync IO smeantics and error handling. In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and make it explicitly handle on asynchronous IO. This forces all users to be switched specifically to one interface or the other and removes any ambiguity between how the interfaces are to be used. It also means that xfs_buf_iowait() goes away. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
e8aaba9a |
|
01-Oct-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality We do some work in xfs_buf_ioend, and some work in xfs_buf_iodone_work, but much of that functionality is the same. This work can all be done in a single function, leaving xfs_buf_iodone just a wrapper to determine if we should execute it by workqueue or directly. hence rename xfs_buf_iodone_work to xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that need async processing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
a870fe6d |
|
01-Oct-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: force the log before shutting down When we have marked the filesystem for shutdown, we want to prevent any further buffer IO from being submitted. However, we currently force the log after marking the filesystem as shut down, hence allowing IO to the log *after* we have marked both the filesystem and the log as in an error state. Clean this up by forcing the log before we mark the filesytem with an error. This replaces the pure CIL flush that we currently have which works around this same issue (i.e the CIL can't be flushed once the shutdown flags are set) and hence enables us to clean up the logic substantially. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
400b9d88 |
|
03-Aug-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: catch buffers written without verifiers attached We recently had a bug where buffers were slipping through log recovery without any verifier attached to them. This was resulting in on-disk CRC mismatches for valid data. Add some warning code to catch this occurrence so that we catch such bugs during development rather than not being aware they exist. Note that we cannot do this verification unconditionally as non-CRC filesystems don't always attach verifiers to the buffers being written. e.g. during log recovery we cannot identify all the different types of buffers correctly on non-CRC filesystems, so we can't attach the correct verifiers in all cases and so we don't attach any. Hence we don't want on non-CRC filesystems to avoid spamming the logs with false indications. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
baff4e44 |
|
14-Jul-2014 |
Brian Foster <bfoster@redhat.com> |
xfs: add xlog sysfs kobject and attribute handlers Embed a kobject into the xfs log data structure (xlog). This creates a 'log' subdirectory for every XFS mount instance in sysfs. The lifecycle of the log kobject is tied to the lifecycle of the log. Also define a set of generic attribute handlers associated with the log kobject in preparation for the addition of attributes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
2451337d |
|
24-Jun-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: global error sign conversion Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
b474c7ae |
|
21-Jun-2014 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: Nuke XFS_ERROR macro XFS_ERROR was designed long ago to trap return values, but it's not runtime configurable, it's not consistently used, and we can do similar error trapping with ftrace scripts and triggers from userspace. Just nuke XFS_ERROR and associated bits. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
36de9556 |
|
06-Jun-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: kill xfs_buf_geterror() Most of the callers are just calling ASSERT(!xfs_buf_geterror()) which means they are checking for bp->b_error == 0. If bp is null in this case, we will assert fail, and hence it's no different in result to oopsing because of a null bp. In some cases, errors have already been checked for or the function returning the buffer can't return a buffer with an error, so it's just a redundant assert. Either way, the assert can either be removed. The other two non-assert callers can just test for a buffer and error properly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
ac983517 |
|
06-May-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: don't sleep in xlog_cil_force_lsn on shutdown Reports of a shutdown hang when fsyncing a directory have surfaced, such as this: [ 3663.394472] Call Trace: [ 3663.397199] [<ffffffff815f1889>] schedule+0x29/0x70 [ 3663.402743] [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs] [ 3663.416249] [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs] [ 3663.429271] [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs] [ 3663.435873] [<ffffffff811df8c5>] do_fsync+0x65/0xa0 [ 3663.441408] [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20 [ 3663.447043] [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b If we trigger a shutdown in xlog_cil_push() from xlog_write(), we will never wake waiters on the current push sequence number, so anything waiting in xlog_cil_force_lsn() for that push sequence number to come up will not get woken and hence stall the shutdown. Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in the push abort handling, in the log shutdown code when waking all waiters, and adding a shutdown check in the sequence completion wait loops to ensure they abort when a wakeup due to a shutdown occurs. Reported-by: Boris Ranto <branto@redhat.com> Reported-by: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
c99d609a |
|
05-May-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: fully support v5 format filesystems We have had this code in the kernel for over a year now and have shaken all the known issues out of the code over the past few releases. It's now time to remove the experimental warnings during mount and fully support the new filesystem format in production systems. Remove the experimental warning, and add a version number to the initial "mounting filesystem" message to tell use what type of filesystem is being mounted. Also, remove the temporary inode cluster size output at mount time now we know that this code works fine. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
9c23eccc |
|
16-Apr-2014 |
Dave Chinner <dchinner@redhat.com> |
xfs: unmount does not wait for shutdown during unmount And interesting situation can occur if a log IO error occurs during the unmount of a filesystem. The cases reported have the same signature - the update of the superblock counters fails due to a log write IO error: XFS (dm-16): xfs_do_force_shutdown(0x2) called from line 1170 of file fs/xfs/xfs_log.c. Return address = 0xffffffffa08a44a1 XFS (dm-16): Log I/O Error Detected. Shutting down filesystem XFS (dm-16): Unable to update superblock counters. Freespace may not be correct on next mount. XFS (dm-16): xfs_log_force: error 5 returned. XFS (¿-¿¿¿): Please umount the filesystem and rectify the problem(s) It can be seen that the last line of output contains a corrupt device name - this is because the log and xfs_mount structures have already been freed by the time this message is printed. A kernel oops closely follows. The issue is that the shutdown is occurring in a separate IO completion thread to the unmount. Once the shutdown processing has started and all the iclogs are marked with XLOG_STATE_IOERROR, the log shutdown code wakes anyone waiting on a log force so they can process the shutdown error. This wakes up the unmount code that is doing a synchronous transaction to update the superblock counters. The unmount path now sees all the iclogs are marked with XLOG_STATE_IOERROR and so never waits on them again, knowing that if it does, there will not be a wakeup trigger for it and we will hang the unmount if we do. Hence the unmount runs through all the remaining code and frees all the filesystem structures while the xlog_iodone() is still processing the shutdown. When the log shutdown processing completes, xfs_do_force_shutdown() emits the "Please umount the filesystem and rectify the problem(s)" message, and xlog_iodone() then aborts all the objects attached to the iclog. An iclog that has already been freed.... The real issue here is that there is no serialisation point between the log IO and the unmount. We have serialisations points for log writes, log forces, reservations, etc, but we don't actually have any code that wakes for log IO to fully complete. We do that for all other types of object, so why not iclogbufs? Well, it turns out that we can easily do this. We've got xfs_buf handles, and that's what everyone else uses for IO serialisation. i.e. bp->b_sema. So, lets hold iclogbufs locked over IO, and only release the lock in xlog_iodone() when we are finished with the buffer. That way before we tear down the iclog, we can lock and unlock the buffer to ensure IO completion has finished completely before we tear it down. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Bob Mastors <bob.mastors@solidfire.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
750b9c90 |
|
31-Oct-2013 |
Dave Chinner <dchinner@redhat.com> |
xfs: trace AIL manipulations I debugging a log tail issue on a RHEL6 kernel, I added these trace points to trace log items being added, moved and removed in the AIL and how that affected the log tail LSN that was written to the log. They were very helpful in that they immediately identified the cause of the problem being seen. Hence I'd like to always have them available for use. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
643f7c4e |
|
30-Oct-2013 |
Geyslan G. Bem <geyslan@gmail.com> |
xfs: fix possible NULL dereference in xlog_verify_iclog In xlog_verify_iclog a debug check of the incore log buffers prints an error if icptr is null and then goes on to dereference the pointer regardless. Convert this to an assert so that the intention is clear. This was reported by Coverty. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
|
#
a4fbe6ab |
|
22-Oct-2013 |
Dave Chinner <dchinner@redhat.com> |
xfs: decouple inode and bmap btree header files Currently the xfs_inode.h header has a dependency on the definition of the BMAP btree records as the inode fork includes an array of xfs_bmbt_rec_host_t objects in it's definition. Move all the btree format definitions from xfs_btree.h, xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to xfs_format.h to continue the process of centralising the on-disk format definitions. With this done, the xfs inode definitions are no longer dependent on btree header files. The enables a massive culling of unnecessary includes, with close to 200 #include directives removed from the XFS kernel code base. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
239880ef |
|
22-Oct-2013 |
Dave Chinner <dchinner@redhat.com> |
xfs: decouple log and transaction headers xfs_trans.h has a dependency on xfs_log.h for a couple of structures. Most code that does transactions doesn't need to know anything about the log, but this dependency means that they have to include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header files and clean up the includes to be in dependency order. In doing this, remove the direct include of xfs_trans_reserve.h from xfs_trans.h so that we remove the dependency between xfs_trans.h and xfs_mount.h. Hence the xfs_trans.h include can be moved to the indicate the actual dependencies other header files have on it. Note that these are kernel only header files, so this does not translate to any userspace changes at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
70a9883c |
|
22-Oct-2013 |
Dave Chinner <dchinner@redhat.com> |
xfs: create a shared header file for format-related information All of the buffer operations structures are needed to be exported for xfs_db, so move them all to a common location rather than spreading them all over the place. They are verifying the on-disk format, so while xfs_format.h might be a good place, it is not part of the on disk format. Hence we need to create a new header file that we centralise these related definitions. Start by moving the bffer operations structures, and then also move all the other definitions that have crept into xfs_log_format.h and xfs_format.h as there was no other shared header file to put them in. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
08e96e1a |
|
11-Oct-2013 |
Eric Sandeen <sandeen@sandeen.net> |
xfs: remove newlines from strings passed to __xfs_printk __xfs_printk adds its own "\n". Having it in the original string leads to unintentional blank lines from these messages. Most format strings have no newline, but a few do, leading to i.e.: [ 7347.119911] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119911] [ 7347.119919] XFS (sdb2): Access to block zero in inode 132 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 1a05 [ 7347.119919] Fix them all. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
2c6e24ce |
|
14-Oct-2013 |
Dave Chinner <dchinner@redhat.com> |
xfs: prevent deadlock trying to cover an active log Recent analysis of a deadlocked XFS filesystem from a kernel crash dump indicated that the filesystem was stuck waiting for log space. The short story of the hang on the RHEL6 kernel is this: - the tail of the log is pinned by an inode - the inode has been pushed by the xfsaild - the inode has been flushed to it's backing buffer and is currently flush locked and hence waiting for backing buffer IO to complete and remove it from the AIL - the backing buffer is marked for write - it is on the delayed write queue - the inode buffer has been modified directly and logged recently due to unlinked inode list modification - the backing buffer is pinned in memory as it is in the active CIL context. - the xfsbufd won't start buffer writeback because it is pinned - xfssyncd won't force the log because it sees the log as needing to be covered and hence wants to issue a dummy transaction to move the log covering state machine along. Hence there is no trigger to force the CIL to the log and hence unpin the inode buffer and therefore complete the inode IO, remove it from the AIL and hence move the tail of the log along, allowing transactions to start again. Mainline kernels also have the same deadlock, though the signature is slightly different - the inode buffer never reaches the delayed write lists because xfs_buf_item_push() sees that it is pinned and hence never adds it to the delayed write list that the xfsaild flushes. There are two possible solutions here. The first is to simply force the log before trying to cover the log and so ensure that the CIL is emptied before we try to reserve space for the dummy transaction in the xfs_log_worker(). While this might work most of the time, it is still racy and is no guarantee that we don't get stuck in xfs_trans_reserve waiting for log space to come free. Hence it's not the best way to solve the problem. The second solution is to modify xfs_log_need_covered() to be aware of the CIL. We only should be attempting to cover the log if there is no current activity in the log - covering the log is the process of ensuring that the head and tail in the log on disk are identical (i.e. the log is clean and at idle). Hence, by definition, if there are items in the CIL then the log is not at idle and so we don't need to attempt to cover it. When we don't need to cover the log because it is active or idle, we issue a log force from xfs_log_worker() - if the log is idle, then this does nothing. However, if the log is active due to there being items in the CIL, it will force the items in the CIL to the log and unpin them. In the case of the above deadlock scenario, instead of xfs_log_worker() getting stuck in xfs_trans_reserve() attempting to cover the log, it will instead force the log, thereby unpinning the inode buffer, allowing IO to be issued and complete and hence removing the inode that was pinning the tail of the log from the AIL. At that point, everything will start moving along again. i.e. the xfs_log_worker turns back into a watchdog that can alleviate deadlocks based around pinned items that prevent the tail of the log from being moved... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
a30b0367 |
|
02-Sep-2013 |
Dave Chinner <david@fromorbit.com> |
xfs: fix some minor sparse warnings A couple of simple locking annotations and 0 vs NULL warnings. Nothing that changes any code behaviour, just removes build noise. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
8e159e72 |
|
11-Aug-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
xfs: fix the comment of xfs_log_unmount_write() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
ac0e300f |
|
07-Aug-2013 |
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> |
xfs: fix the comment of xlog_write_setup_copy() Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
3e7b91cf |
|
12-Aug-2013 |
Jie Liu <jeff.liu@oracle.com> |
xfs: Validate log space at mount time Validate log space during log mount stage, the underlying function will drop a warning message via syslog in critical level if the log space is too small or too large. [ dchinner: For CRC enable filesystems, abort the mounting of the filesystem as mkfs should never make a log too small for the given filesystem configuration. ] [ dchinner: make a note of the fact that the log size limits in block counts are in units of filesystem blocks, not basic blocks. ] Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
e773fc93 |
|
12-Aug-2013 |
Jie Liu <jeff.liu@oracle.com> |
xfs: Refactor xfs_ticket_alloc() to extract a new helper Refactor xlog_ticket_alloc() to extract a new helper, i.e. xfs_log_calc_unit_res(). This helper would be used to calculate the total log reservation size by adding extra log operation/transation headers for a new log ticket. Signed-off-by: Jie Liu <jeff.liu@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
297aa637 |
|
19-Jul-2013 |
Chandra Seetharaman <sekharan@us.ibm.com> |
xfs: Fix a deadlock in xfs_log_commit_cil() code path While testing and rearranging pquota/gquota code, I stumbled on a xfs_shutdown() during a mount. But the mount just hung. Debugged and found that there is a deadlock involving &log->l_cilp->xc_ctx_lock. It is in a code path where &log->l_cilp->xc_ctx_lock is first acquired in read mode and some levels down the same semaphore is being acquired in write mode causing a deadlock. This is the stack: xfs_log_commit_cil -> acquires &log->l_cilp->xc_ctx_lock in read mode xlog_print_tic_res xfs_force_shutdown xfs_log_force_umount xlog_cil_force xlog_cil_force_lsn xlog_cil_push_foreground xlog_cil_push - tries to acquire same semaphore in write mode This patch fixes the deadlock by changing the reason code for xfs_force_shutdown in xlog_print_tic_res() to SHUTDOWN_LOG_IO_ERROR. SHUTDOWN_LOG_IO_ERROR is the right reason code to be set since we are in the log path. Thanks to Dave for suggesting this solution. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
fd63875c |
|
27-Jun-2013 |
Dave Chinner <dchinner@redhat.com> |
xfs: Introduce ordered log vector support And "ordered log vector" is a log vector that is used for tracking a log item through the CIL and into the AIL as part of the log checkpointing. These ordered log vectors are special in that they are not written to to journal in any way, and are not accounted to the checkpoint being written. The reason for this behaviour is to allow operations to attach items to transactions and have them follow the normal transactional lifecycle without actually having to write them to the journal. This allows logging of items that track high level logical changes and writing them to the log, while the physical items being modified pass through into the AIL and pin the tail of the log (and therefore the logical item in the log) until all the modified items are physically written to disk. IOWs, it allows us to write metadata without physically logging every individual change but still maintain the full transactional integrity guarantees we currently have w.r.t. crash recovery. This change modifies some of the CIL item insertion loops, as ordered log vectors introduce some new constraints as they don't track any data. One advantage of this change is that it combines two log vector chain walks into a single pass, so there is less overhead in the transaction commit pass as well. It also kills some unused code in the log vector walk loop when committing the CIL. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
ecb3403d |
|
04-Mar-2013 |
Akinobu Mita <akinobu.mita@gmail.com> |
xfs: rename random32() to prandom_u32() Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: <bpm@sgi.com> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: xfs@oss.sgi.com Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
003fd6c8 |
|
18-Jan-2013 |
Ben Myers <bpm@sgi.com> |
xfs: fix fs/xfs/xfs_log.c:1740:39: error: 'B_TRUE' undeclared Commit 667a9291c5b3 "xfs: Remove boolean_t typedef completely." didn't. Remove a stray B_TRUE that breaks CONFIG_XFS_DEBUG=y. Signed-off-by: Ben Myers <bpm@sgi.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
|
#
667a9291 |
|
12-Nov-2012 |
Thiago Farina <tfarina@chromium.org> |
xfs: Remove boolean_t typedef completely. Since we are using C99 we have one builtin defined in include/linux/types.h, use that instead. v2: you missed one in fs/xfs/xfs_qm_bhv.c, cleaned up. -bpm Signed-off-by: Thiago Farina <tfarina@chromium.org> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
f9668a09 |
|
27-Nov-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix sparse reported log CRC endian issue Not a bug as such, just warning noise from the xlog_cksum() returning a __be32 type when it should be returning a __le32 type. On Wed, Nov 28, 2012 at 08:30:59AM -0500, Christoph Hellwig wrote: > But why are we storing the crc field little endian while all other on > disk formats are big endian? (And yes I realize it might as well have > been me who did that back in the idea, but I still have no idea why) Because the CRC always returns the calcuation LE format, even on BE systems. So rather than always having to byte swap it everywhere and have all the force casts and anootations for sparse, it seems simpler to just make it a __le32 everywhere.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
437a255a |
|
27-Nov-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: fix direct IO nested transaction deadlock. The direct IO path can do a nested transaction reservation when writing past the EOF. The first transaction is the append transaction for setting the filesize at IO completion, but we can also need a transaction for allocation of blocks. If the log is low on space due to reservations and small log, the append transaction can be granted after wating for space as the only active transaction in the system. This then attempts a reservation for an allocation, which there isn't space in the log for, and the reservation sleeps. The result is that there is nothing left in the system to wake up all the processes waiting for log space to come free. The stack trace that shows this deadlock is relatively innocuous: xlog_grant_head_wait xlog_grant_head_check xfs_log_reserve xfs_trans_reserve xfs_iomap_write_direct __xfs_get_blocks xfs_get_blocks_direct do_blockdev_direct_IO __blockdev_direct_IO xfs_vm_direct_IO generic_file_direct_write xfs_file_dio_aio_writ xfs_file_aio_write do_sync_write vfs_write This was discovered on a filesystem with a log of only 10MB, and a log stripe unit of 256k whih increased the base reservations by 512k. Hence a allocation transaction requires 1.2MB of log space to be available instead of only 260k, and so greatly increased the chance that there wouldn't be enough log space available for the nested transaction to succeed. The key to reproducing it is this mkfs command: mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV The test case was a 1000 fsstress processes running with random freeze and unfreezes every few seconds. Thanks to Eryu Guan (eguan@redhat.com) for writing the test that found this on a system with a somewhat unique default configuration.... cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
0e446be4 |
|
12-Nov-2012 |
Christoph Hellwig <hch@lst.de> |
xfs: add CRC checks to the log Implement CRCs for the log buffers. We re-use a field in struct xlog_rec_header that was used for a weak checksum of the log buffer payload in debug builds before. The new checksumming uses the crc32c checksum we will use elsewhere in XFS, and also protects the record header and addition cycle data. Due to this there are some interesting changes in xlog_sync, as we need to do the cycle wrapping for the split buffer case much earlier, as we would touch the buffer after generating the checksum otherwise. The CRC calculation is always enabled, even for non-CRC filesystems, as adding this CRC does not change the log format. On non-CRC filesystems, only issue an alert if a CRC mismatch is found and allow recovery to continue - this will act as an indicator that log recovery problems are a result of log corruption. On CRC enabled filesystems, however, log recovery will fail. Note that existing debug kernels will write a simple checksum value to the log, so the first time this is run on a filesystem taht was last used on a debug kernel it will through CRC mismatch warning errors. These can be ignored. Initially based on a patch from Dave Chinner, then modified significantly by Christoph Hellwig. Modified again by Dave Chinner to get to this version. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
c3f8fc73 |
|
12-Nov-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: make buffer read verication an IO completion function Add a verifier function callback capability to the buffer read interfaces. This will be used by the callers to supply a function that verifies the contents of the buffer when it is read from disk. This patch does not provide callback functions, but simply modifies the interfaces to allow them to be called. The reason for adding this to the read interfaces is that it is very difficult to tell fom the outside is a buffer was just read from disk or whether we just pulled it out of cache. Supplying a callbck allows the buffer cache to use it's internal knowledge of the buffer to execute it only when the buffer is read from disk. It is intended that the verifier functions will mark the buffer with an EFSCORRUPTED error when verification fails. This allows the reading context to distinguish a verification error from an IO error, and potentially take further actions on the buffer (e.g. attempt repair) based on the error reported. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
7e9620f2 |
|
08-Oct-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: only update the last_sync_lsn when a transaction completes The log write code stamps each iclog with the current tail LSN in the iclog header so that recovery knows where to find the tail of thelog once it has found the head. Normally this is taken from the first item on the AIL - the log item that corresponds to the oldest active item in the log. The problem is that when the AIL is empty, the tail lsn is dervied from the the l_last_sync_lsn, which is the LSN of the last iclog to be written to the log. In most cases this doesn't happen, because the AIL is rarely empty on an active filesystem. However, when it does, it opens up an interesting case when the transaction being committed to the iclog spans multiple iclogs. That is, the first iclog is stamped with the l_last_sync_lsn, and IO is issued. Then the next iclog is setup, the changes copied into the iclog (takes some time), and then the l_last_sync_lsn is stamped into the header and IO is issued. This is still the same transaction, so the tail lsn of both iclogs must be the same for log recovery to find the entire transaction to be able to replay it. The problem arises in that the iclog buffer IO completion updates the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog completes it's IO before the second iclog is filled and has the tail lsn stamped in it, it will stamp the LSN of the first iclog into it's tail lsn field. If the system fails at this point, log recovery will not see a complete transaction, so the transaction will no be replayed. The fix is simple - the l_last_sync_lsn is updated when a iclog buffer IO completes, and this is incorrect. The l_last_sync_lsn shoul dbe updated when a transaction is completed by a iclog buffer IO. That is, only iclog buffers that have transaction commit callbacks attached to them should update the l_last_sync_lsn. This means that the last_sync_lsn will only move forward when a commit record it written, not in the middle of a large transaction that is rolling through multiple iclog buffers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
d35e88fa |
|
08-Oct-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: only update the last_sync_lsn when a transaction completes The log write code stamps each iclog with the current tail LSN in the iclog header so that recovery knows where to find the tail of thelog once it has found the head. Normally this is taken from the first item on the AIL - the log item that corresponds to the oldest active item in the log. The problem is that when the AIL is empty, the tail lsn is dervied from the the l_last_sync_lsn, which is the LSN of the last iclog to be written to the log. In most cases this doesn't happen, because the AIL is rarely empty on an active filesystem. However, when it does, it opens up an interesting case when the transaction being committed to the iclog spans multiple iclogs. That is, the first iclog is stamped with the l_last_sync_lsn, and IO is issued. Then the next iclog is setup, the changes copied into the iclog (takes some time), and then the l_last_sync_lsn is stamped into the header and IO is issued. This is still the same transaction, so the tail lsn of both iclogs must be the same for log recovery to find the entire transaction to be able to replay it. The problem arises in that the iclog buffer IO completion updates the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog completes it's IO before the second iclog is filled and has the tail lsn stamped in it, it will stamp the LSN of the first iclog into it's tail lsn field. If the system fails at this point, log recovery will not see a complete transaction, so the transaction will no be replayed. The fix is simple - the l_last_sync_lsn is updated when a iclog buffer IO completes, and this is incorrect. The l_last_sync_lsn shoul dbe updated when a transaction is completed by a iclog buffer IO. That is, only iclog buffers that have transaction commit callbacks attached to them should update the l_last_sync_lsn. This means that the last_sync_lsn will only move forward when a commit record it written, not in the middle of a large transaction that is rolling through multiple iclog buffers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
c75921a7 |
|
08-Oct-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: xfs_quiesce_attr() should quiesce the log like unmount xfs_quiesce_attr() is supposed to leave the log empty with an unmount record written. Right now it does not wait for the AIL to be emptied before writing the unmount record, not does it wait for metadata IO completion, either. Fix it to use the same method and code as xfs_log_unmount(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
5889608d |
|
08-Oct-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: syncd workqueue is no more With the syncd functions moved to the log and/or removed, the syncd workqueue is the only remaining bit left. It is used by the log covering/ail pushing work, as well as by the inode reclaim work. Given how cheap workqueues are these days, give the log and inode reclaim work their own work queues and kill the syncd work queue. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
cf2931db |
|
08-Oct-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: Bring some sanity to log unmounting When unmounting the filesystem, there are lots of operations that need to be done in a specific order, and they are spread across across a couple of functions. We have to drain the AIL before we write the unmount record, and we have to shut down the background log work before we do either of them. But this is all split haphazardly across xfs_unmountfs() and xfs_log_unmount(). Move all the AIL flushing and log manipulations to xfs_log_unmount() so that the responisbilities of each function is clear and the operations they perform obvious. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
f661f1e0 |
|
08-Oct-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: sync work is now only periodic log work The only thing the periodic sync work does now is flush the AIL and idle the log. These are really functions of the log code, so move the work to xfs_log.c and rename it appropriately. The only wart that this leaves behind is the xfssyncd_centisecs sysctl, otherwise the xfssyncd is dead. Clean up any comments that related to xfssyncd to reflect it's passing. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
9a8d2fdb |
|
14-Jun-2012 |
Mark Tinguely <tinguely@sgi.com> |
xfs: remove xlog_t typedef Remove the xlog_t type definitions. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
f7bdf03a |
|
14-Jun-2012 |
Mark Tinguely <tinguely@sgi.com> |
xfs: rename log structure to xlog Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
8866fc6f |
|
25-May-2012 |
Ben Myers <bpm@sgi.com> |
xfs: shutdown xfs_sync_worker before the log Revert commit 1307bbd, which uses the s_umount semaphore to provide exclusion between xfs_sync_worker and unmount, in favor of shutting down the sync worker before freeing the log in xfs_log_unmount. This is a cleaner way of resolving the race between xfs_sync_worker and unmount than using s_umount. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
ad223e60 |
|
14-Jun-2012 |
Mark Tinguely <tinguely@sgi.com> |
xfs: rename log structure to xlog Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
11159a05 |
|
25-May-2012 |
Ben Myers <bpm@sgi.com> |
xfs: shutdown xfs_sync_worker before the log Revert commit 1307bbd, which uses the s_umount semaphore to provide exclusion between xfs_sync_worker and unmount, in favor of shutting down the sync worker before freeing the log in xfs_log_unmount. This is a cleaner way of resolving the race between xfs_sync_worker and unmount than using s_umount. Signed-off-by: Ben Myers <bpm@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
77ba7877 |
|
02-Apr-2012 |
Al Viro <viro@zeniv.linux.org.uk> |
xfs: switch to proper __bitwise type for KM_... flags Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
14c26c6a |
|
24-Apr-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: add trace points for log forces To enable easy tracing of the location of log forces and the frequency of them via perf, add a pair of trace points to the log force functions. This will help debug where excessive log forces are being issued from by simple perf commands like: # ~/perf/perf top -e xfs:xfs_log_force -G -U Which gives this sort of output: Events: 141 xfs:xfs_log_force - 100.00% [kernel] [k] xfs_log_force - xfs_log_force 87.04% xfsaild kthread kernel_thread_helper - 12.87% xfs_buf_lock _xfs_buf_find xfs_buf_get xfs_trans_get_buf xfs_da_do_buf xfs_da_get_buf xfs_dir2_data_init xfs_dir2_leaf_addname xfs_dir_createname xfs_create xfs_vn_mknod xfs_vn_create vfs_create do_last.isra.41 path_openat do_filp_open do_sys_open sys_open system_call_fastpath Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sig.com>
|
#
ad1e95c5 |
|
22-Apr-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: clean up xfs_bit.h includes With the removal of xfs_rw.h and other changes over time, xfs_bit.h is being included in many files that don't actually need it. Clean up the includes as necessary. Also move the only-used-once xfs_ialloc_find_free() static inline function out of a header file that is widely included to reduce the number of needless dependencies on xfs_bit.h. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
2a0ec1d9 |
|
22-Apr-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: move xfs_get_extsz_hint() and kill xfs_rw.h The only thing left in xfs_rw.h is a function prototype for an inode function. Move that to xfs_inode.h, and kill xfs_rw.h. Also move the function implementing the prototype from xfs_rw.c to xfs_inode.c so we only have one function left in xfs_rw.c Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
60a34607 |
|
22-Apr-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: move xfsagino_t to xfs_types.h Untangle the header file includes a bit by moving the definition of xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include xfs_ag.h. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
aa0e8833 |
|
22-Apr-2012 |
Dave Chinner <david@fromorbit.com> |
xfs: use blocks for storing the desired IO size Now that we pass block counts everywhere, and index buffers by block number and length in units of blocks, convert the desired IO size into block counts rather than bytes. Convert the code to use block counts, and those that need byte counts get converted at the time of use. Rename the b_desired_count variable to something closer to it's purpose - b_io_length - as it is only used to specify the length of an IO for a subset of the buffer. The only time this is used is for log IO - both writing iclogs and during log recovery. In all other cases, the b_io_length matches b_length, and hence a lot of code confuses the two. e.g. the buf item code uses the io count exclusively when it should be using the buffer length. Fix these apprpriately as they are found. Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers around the desired IO length. They only serve to make the code shouty loud, don't actually add any real value, and are often used incorrectly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
4e94b71b |
|
22-Apr-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: use blocks for counting length of buffers Now that we pass block counts everywhere, and index buffers by block number, track the length of the buffer in units of blocks rather than bytes. Convert the code to use block counts, and those that need byte counts get converted at the time of use. Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers around the buffer length. They only serve to make the code shouty loud and don't actually add any real value. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
e70b73f8 |
|
22-Apr-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: clean up buffer get/read call API The xfs_buf_get/read API is not consistent in the units it uses, and does not use appropriate or consistent units/types for the variables. Convert the API to use disk addresses and block counts for all buffer get and read calls. Use consistent naming for all the functions and their declarations, and convert the internal functions to use disk addresses and block counts to avoid need to convert them from one type to another and back again. Fix all the callers to use disk addresses and block counts. In many cases, this removes an additional conversion from the function call as the callers already have a block count. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
1c304625 |
|
22-Apr-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: allow assigning the tail lsn with the AIL lock held Provide a variant of xlog_assign_tail_lsn that has the AIL lock already held. By doing so we do an additional atomic_read + atomic_set under the lock, which comes down to two instructions. Switch xfs_trans_ail_update_bulk and xfs_trans_ail_delete_bulk to the new version to reduce the number of lock roundtrips, and prepare for a new addition that would require a third lock roundtrip in xfs_trans_ail_delete_bulk. This addition is also the reason for slightly rearranging the conditionals and relying on xfs_log_space_wake for checking that the filesystem has been shut down internally. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
3948659e |
|
21-Mar-2012 |
Dave Chinner <dchinner@redhat.com> |
xfs: Account log unmount transaction correctly There have been a few reports of this warning appearing recently: XFS (dm-4): xlog_space_left: head behind tail tail_cycle = 129, tail_bytes = 20163072 GH cycle = 129, GH bytes = 20162880 The common cause appears to be lots of freeze and unfreeze cycles, and the output from the warnings indicates that we are leaking around 8 bytes of log space per freeze/unfreeze cycle. When we freeze the filesystem, we write an unmount record and that uses xlog_write directly - a special type of transaction, effectively. What it doesn't do, however, is correctly account for the log space it uses. The unmount record writes an 8 byte structure with a special magic number into the log, and the space this consumes is not accounted for in the log ticket tracking the operation. Hence we leak 8 bytes every unmount record that is written. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
9006fb91 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: split and cleanup xfs_log_reserve Split the log regrant case out of xfs_log_reserve into a separate function, and merge xlog_grant_log_space and xlog_regrant_write_log_space into their respective callers. Also replace the XFS_LOG_PERM_RESERV flag, which easily got misused before the previous cleanups with a simple boolean parameter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
42ceedb3 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: share code for grant head availability checks Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
e179840d |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: share code for grant head wakeups Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
23ee3df3 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: share code for grant head waiting Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
a79bf2d7 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: add xlog_grant_head_wake_all Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
c303c5b8 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: add xlog_grant_head_init Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
28496968 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: add the xlog_grant_head structure Add a new data structure to allow sharing code between the log grant and regrant code. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
14a7235f |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove log space waitqueues The tic->t_wait waitqueues can never have more than a single waiter on them, so we can easily replace them with a task_struct pointer and wake_up_process. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
cfb7cdca |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: cleanup xfs_log_space_wake Remove the now unused opportunistic parameter, and use the the xlog_writeq_wake and xlog_reserveq_wake helpers now that we don't have to care about the opportunistic wakeups. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
3af1de75 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: do exact log space wakeups in xlog_ungrant_log_space The only reason that xfs_log_space_wake had to do opportunistic wakeups was that the old xfs_log_move_tail calling convention didn't allow for exact wakeups when not updating the log tail LSN. Since this issue has been fixed we can do exact wakeups now. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
09a423a3 |
|
19-Feb-2012 |
Christoph Hellwig <hch@infradead.org> |
xfs: split tail_lsn assignments from log space wakeups Currently xfs_log_move_tail has a tail_lsn argument that is horribly overloaded: it may contain either an actual lsn to assign to the log tail, 0 as a special case to use the last sync LSN, or 1 to indicate that no tail LSN assignment should be performed, and we should opportunisticly wake up at one task waiting for log space even if we did not move the LSN. Remove the tail lsn assigned from xfs_log_move_tail and make the two callers use xlog_assign_tail_lsn instead of the current variant of partially using the code in xfs_log_move_tail and partially opencoding it. Note that means we grow an addition lock roundtrip on the AIL lock for each bulk update or delete, which is still far less than what we had before introducing the bulk operations. If this proves to be a problem we can still add a variant of xlog_assign_tail_lsn that expects the lock to be held already. Also rename the remainder of xfs_log_move_tail to xfs_log_space_wake as that name describes its functionality much better. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
93b8a585 |
|
06-Dec-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove the deprecated nodelaylog option The delaylog mode has been the default for a long time, and the nodelaylog option has been scheduled for removal in Linux 3.3. Remove it and code only used by it now that we have opened the 3.3 window. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
9f9c19ec |
|
28-Nov-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: fix the logspace waiting algorithm Apply the scheme used in log_regrant_write_log_space to wake up any other threads waiting for log space before the newly added one to log_regrant_write_log_space as well, and factor the code into readable helpers. For each of the queues we have add two helpers: - one to try to wake up all waiting threads. This helper will also be usable by xfs_log_move_tail once we remove the current opportunistic wakeups in it. - one to sleep on t_wait until enough log space is available, loosely modelled after Linux waitqueues. And use them to reimplement the guts of log_regrant_write_log_space and log_regrant_write_log_space. These two function now use one and the same algorithm for waiting on log space instead of subtly different ones before, with an option to completely unify them in the near future. Also move the filesystem shutdown handling to the common caller given that we had to touch it anyway. Based on hard debugging and an earlier patch from Chandra Seetharaman <sekharan@us.ibm.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com> Tested-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Ben Myers <bpm@sgi.com>
|
#
272e42b2 |
|
28-Oct-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: constify xfs_item_ops The log item ops aren't nessecarily the biggest exploit vector, but marking them const is easy enough. Also remove the unused xfs_item_ops_t typedef while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
|
#
901796af |
|
10-Oct-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: clean up xfs_ioerror_alert Instead of passing the block number and mount structure explicitly get them off the bp and fix make the argument order more natural. Also move it to xfs_buf.c and stop printing the device name given that we already get the fs name as part of xfs_alert, and we know what device is operates on because of the caller that gets printed, finally rename it to xfs_buf_ioerror_alert and pass __func__ as argument where it makes sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
4347b9d7 |
|
10-Oct-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: clean up buffer allocation Change _xfs_buf_initialize to allocate the buffer directly and rename it to xfs_buf_alloc now that is the only buffer allocation routine. Also remove the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
c867cb61 |
|
10-Oct-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
02fe03d9 |
|
22-Jul-2011 |
Chandra Seetharaman <sekharan@us.ibm.com> |
xfs: Remove the macro XFS_BUF_SET_PTR Remove the definition and usages of the macro XFS_BUF_SET_PTR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
62926044 |
|
22-Jul-2011 |
Chandra Seetharaman <sekharan@us.ibm.com> |
xfs: Remove the macro XFS_BUF_PTR Remove the definition and usages of the macro XFS_BUF_PTR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
b75e40a4 |
|
22-Jul-2011 |
Chandra Seetharaman <sekharan@us.ibm.com> |
xfs: Remove macro XFS_BUF_BUSY and family Remove the definitions and uses of the macros XFS_BUF_BUSY, XFS_BUF_UNBUSY, and XFS_BUF_ISBUSY. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
5a52c2a58 |
|
22-Jul-2011 |
Chandra Seetharaman <sekharan@us.ibm.com> |
xfs: Remove the macro XFS_BUF_ERROR and family Remove the definitions and usage of the macros XFS_BUF_ERROR, XFS_BUF_GETERROR and XFS_BUF_ISERROR. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
cb669ca5 |
|
13-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: remove wrappers around b_iodone Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
adadbeef |
|
13-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: remove wrappers around b_fspriv Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
bf9d9013 |
|
13-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: add a proper transaction pointer to struct xfs_buf Replace the typeless b_fspriv2 and the ugly macros around it with a properly typed transaction pointer. As a fallout the log buffer state debug checks are also removed. We could have kept them using casts, but as they do not have a real purpose we can as well just remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
e163cbde |
|
08-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: avoid a few disk cache flushes There is no need for a pre-flush when doing writing the second part of a split log buffer, and if we are using an external log there is no need to do a full cache flush of the log device at all given that all writes to it use the FUA flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
1d5ae5df |
|
08-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup I/O-related buffer flags Remove the unused and misnamed _XBF_RUN_QUEUES flag, rename XBF_LOG_BUFFER to the more fitting XBF_SYNCIO, and split XBF_ORDERED into XBF_FUA and XBF_FLUSH to allow more fine grained control over the bio flags. Also cleanup processing of the flags in _xfs_buf_ioapply to make more sense, and renumber the sparse flag number space to group flags by purpose. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
c8da0faf |
|
08-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: return the buffer locked from xfs_buf_get_uncached All other xfs_buf_get/read-like helpers return the buffer locked, make sure xfs_buf_get_uncached isn't different for no reason. Half of the callers already lock it directly after, and the others probably should also keep it locked if only for consistency and beeing able to use xfs_buf_rele, but I'll leave that for later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
0c842ad4 |
|
08-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: clean up buffer locking helpers Rename xfs_buf_cond_lock and reverse it's return value to fit most other trylock operations in the Kernel and XFS (with the exception of down_trylock, after which xfs_buf_cond_lock was modelled), and replace xfs_buf_lock_val with an xfs_buf_islocked for use in asserts, or and opencoded variant in tracing. remove the XFS_BUF_* wrappers for all the locking helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
69ef921b |
|
08-Jul-2011 |
Christoph Hellwig <hch@lst.de> |
xfs: byteswap constants instead of variables Micro-optimize various comparisms by always byteswapping the constant instead of the variable, which allows to do the swap at compile instead of runtime. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
a27a263b |
|
15-Jun-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: make log devices with write back caches work There's no reason not to support cache flushing on external log devices. The only thing this really requires is flushing the data device first both in fsync and log commits. A side effect is that we also have to remove the barrier write test during mount, which has been superflous since the new FLUSH+FUA code anyway. Also use the chance to flush the RT subvolume write cache before the fsync commit, which is required for correct semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
44396476 |
|
21-Apr-2011 |
Dave Chinner <dchinner@redhat.com> |
xfs: reset buffer pointers before freeing them When we free a vmapped buffer, we need to ensure the vmap address and length we free is the same as when it was allocated. In various places in the log code we change the memory the buffer is pointing to before issuing IO, but we never reset the buffer to point back to it's original memory (or no memory, if that is the case for the buffer). As a result, when we free the buffer it points to memory that is owned by something else and attempts to unmap and free it. Because the range does not match any known mapped range, it can trigger BUG_ON() traps in the vmap code, and potentially corrupt the vmap area tracking. Fix this by always resetting these buffers to their original state before freeing them. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
97d3ac75 |
|
24-Apr-2011 |
Christoph Hellwig <hch@infradead.org> |
xfs: exact busy extent tracking Update the extent tree in case we have to reuse a busy extent, so that it always is kept uptodate. This is done by replacing the busy list searches with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree in case of a reuse. This allows us to allow reusing metadata extents unconditionally, and thus avoid log forces especially for allocation btree blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
da8a1a4a |
|
07-Apr-2011 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert log tail checking to a warning On the Power platform, the log tail debug checks fire excessively causing the system to panic early in testing. The debug checks are known to be racy, though on x86_64 there is no evidence that they trigger at all. We want to keep the checks active on debug systems to alert us to problems with log space accounting, but we need to reduce the impact of a racy check on testing on the Power platform. As a result, convert the ASSERT conditions to warnings, and allow them to fire only once per filesystem mount. This will prevent false positives from interfering with testing, whilst still providing us with the indication that they may be a problem with log space accounting should that occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
|
#
fd074841 |
|
07-Apr-2011 |
Dave Chinner <dchinner@redhat.com> |
xfs: push the AIL from memory reclaim and periodic sync When we are short on memory, we want to expedite the cleaning of dirty objects. Hence when we run short on memory, we need to kick the AIL flushing into action to clean as many dirty objects as quickly as possible. To implement this, sample the lsn of the log item at the head of the AIL and use that as the push target for the AIL flush. Further, we keep items in the AIL that are dirty that are not tracked any other way, so we can get objects sitting in the AIL that don't get written back until the AIL is pushed. Hence to get the filesystem to the idle state, we might need to push the AIL to flush out any remaining dirty objects sitting in the AIL. This requires the same push mechanism as the reclaim push. This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to match the new xfs_ail_max_lsn() function introduced in this patch. Similarly for xfs_trans_ail_push -> xfs_ail_push. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com>
|
#
a0fa2b67 |
|
06-Mar-2011 |
Dave Chinner <dchinner@redhat.com> |
xfs: Convert xlog_warn to new logging interface Convert the xfs log operations to use the new error logging interfaces. This removes the xlog_{warn,panic} wrappers and makes almost all errors emit the device they belong to instead of just refering to "XFS". Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
73efe4a4 |
|
11-Jan-2011 |
Dave Chinner <dchinner@redhat.com> |
xfs: prevent NMI timeouts in cmn_err We currently have a global error message buffer in cmn_err that is protected by a spin lock that disables interrupts. Recently there have been reports of NMI timeouts occurring when the console is being flooded by SCSI error reports due to cmn_err() getting stuck trying to print to the console while holding this lock (i.e. with interrupts disabled). The NMI watchdog is seeing this CPU as non-responding and so is triggering a panic. While the trigger for the reported case is SCSI errors, pretty much anything that spams the kernel log could cause this to occur. Realistically the only reason that we have the intemediate message buffer is to prepend the correct kernel log level prefix to the log message. The only reason we have the lock is to protect the global message buffer and the only reason the message buffer is global is to keep it off the stack. Hence if we can avoid needing a global message buffer we avoid needing the lock, and we can do this with a small amount of cleanup and some preprocessor tricks: 1. clean up xfs_cmn_err() panic mask functionality to avoid needing debug code in xfs_cmn_err() 2. remove the couple of "!" message prefixes that still exist that the existing cmn_err() code steps over. 3. redefine CE_* levels directly to KERN_* 4. redefine cmn_err() and friends to use printk() directly via variable argument length macros. By doing this, we can completely remove the cmn_err() code and the lock that is causing the problems, and rely solely on printk() serialisation to ensure that we don't get garbled messages. A series of followup patches is really needed to clean up all the cmn_err() calls and related messages properly, but that results in a series that is not easily back portable to enterprise kernels. Hence this initial fix is only to address the direct problem in the lowest impact way possible. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
d0eb2f38 |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert grant head manipulations to lockless algorithm The only thing that the grant lock remains to protect is the grant head manipulations when adding or removing space from the log. These calculations are already based on atomic variables, so we can already update them safely without locks. However, the grant head manpulations require atomic multi-step calculations to be executed, which the algorithms currently don't allow. To make these multi-step calculations atomic, convert the algorithms to compare-and-exchange loops on the atomic variables. That is, we sample the old value, perform the calculation and use atomic64_cmpxchg() to attempt to update the head with the new value. If the head has not changed since we sampled it, it will succeed and we are done. Otherwise, we rerun the calculation again from a new sample of the head. This allows us to remove the grant lock from around all the grant head space manipulations, and that effectively removes the grant lock from the log completely. Hence we can remove the grant lock completely from the log at this point. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3f16b985 |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce new locks for the log grant ticket wait queues The log grant ticket wait queues are currently protected by the log grant lock. However, the queues are functionally independent from each other, and operations on them only require serialisation against other queue operations now that all of the other log variables they use are atomic values. Hence, we can make them independent of the grant lock by introducing new locks just to protect the lists operations. because the lists are independent, we can use a lock per list and ensure that reserve and write head queuing do not contend. To ensure forced shutdowns work correctly in conjunction with the new fast paths, ensure that we check whether the log has been shut down in the grant functions once we hold the relevant spin locks but before we go to sleep. This is needed to co-ordinate correctly with the wakeups that are issued on the ticket queues so we don't leave any processes sleeping on the queues during a shutdown. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
c8a09ff8 |
|
03-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert log grant heads to atomic variables Convert the log grant heads to atomic64_t types in preparation for converting the accounting algorithms to atomic operations. his patch just converts the variables; the algorithmic changes are in a separate patch for clarity. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
1c3cb9ec |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert l_tail_lsn to an atomic variable. log->l_tail_lsn is currently protected by the log grant lock. The lock is only needed for serialising readers against writers, so we don't really need the lock if we make the l_tail_lsn variable an atomic. Converting the l_tail_lsn variable to an atomic64_t means we can start to peel back the grant lock from various operations. Also, provide functions to safely crack an atomic LSN variable into it's component pieces and to recombined the components into an atomic variable. Use them where appropriate. This also removes the need for explicitly holding a spinlock to read the l_tail_lsn on 32 bit platforms. Signed-off-by: Dave Chinner <dchinner@redhat.com>
|
#
84f3c683 |
|
03-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert l_last_sync_lsn to an atomic variable log->l_last_sync_lsn is updated in only one critical spot - log buffer Io completion - and is protected by the grant lock here. This requires the grant lock to be taken for every log buffer IO completion. Converting the l_last_sync_lsn variable to an atomic64_t means that we do not need to take the grant lock in log buffer IO completion to update it. This also removes the need for explicitly holding a spinlock to read the l_last_sync_lsn on 32 bit platforms. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
2ced19cb |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: make AIL tail pushing independent of the grant lock The xlog_grant_push_ail() currently takes the grant lock internally to sample the tail lsn, last sync lsn and the reserve grant head. Most of the callers already hold the grant lock but have to drop it before calling xlog_grant_push_ail(). This is a left over from when the AIL tail pushing was done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL push is now done in another thread and hence we can safely hold the grant lock over the entire xlog_grant_push_ail call. Push the grant lock outside of xlog_grant_push_ail() to simplify the locking and synchronisation needed for tail pushing. This will reduce traffic on the grant lock by itself, but this is only one step in preparing for the complete removal of the grant lock. While there, clean up the formatting of xlog_grant_push_ail() to match the rest of the XFS code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
eb40a875 |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: use wait queues directly for the log wait queues The log grant queues are one of the few places left using sv_t constructs for waiting. Given we are touching this code, we should convert them to plain wait queues. While there, convert all the other sv_t users in the log code as well. Seeing as this removes the last users of the sv_t type, remove the header file defining the wrapper and the fragments that still reference it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a69ed03c |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: combine grant heads into a single 64 bit integer Prepare for switching the grant heads to atomic variables by combining the two 32 bit values that make up the grant head into a single 64 bit variable. Provide wrapper functions to combine and split the grant heads appropriately for calculations and use them as necessary. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
663e496a |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: rework log grant space calculations The log grant space calculations are repeated for both write and reserve grant heads. To make it simpler to convert the calculations toa different algorithm, factor them so both the gratn heads use the same calculation functions. Once this is done we can drop the wrappers that are used in only a couple of place to update both grant heads at once as they don't provide any particular value. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
3f336c6f |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: fact out common grant head/log tail verification code Factor repeated debug code out of grant head manipulation functions into a separate function. This removes ifdef DEBUG spagetti from the code and makes the code easier to follow. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
10547941 |
|
20-Dec-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: convert log grant ticket queues to list heads The grant write and reserve queues use a roll-your-own double linked list, so convert it to a standard list_head structure and convert all the list traversals to use list_for_each_entry(). We can also get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty() check to tell if the ticket is in a list or not. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
1a1a3e97 |
|
06-Oct-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove xfs_buf wrappers Stop having two different names for many buffer functions and use the more descriptive xfs_buf_* names directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
686865f7 |
|
24-Sep-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: rename xfs_buf_get_nodaddr to be more appropriate xfs_buf_get_nodaddr() is really used to allocate a buffer that is uncached. While it is not directly assigned a disk address, the fact that they are not cached is a more important distinction. With the upcoming uncached buffer read primitive, we should be consistent with this disctinction. While there, make page allocation in xfs_buf_get_nodaddr() safe against memory reclaim re-entrancy into the filesystem by allowing a flags parameter to be passed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com>
|
#
80f6c29d |
|
18-Aug-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: replace barriers with explicit flush / FUA usage Switch to the WRITE_FLUSH_FUA flag for log writes and remove the EOPNOTSUPP detection for barriers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
a44f13ed |
|
23-Aug-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: Reduce log force overhead for delayed logging Delayed logging adds some serialisation to the log force process to ensure that it does not deference a bad commit context structure when determining if a CIL push is necessary or not. It does this by grabing the CIL context lock exclusively, then dropping it before pushing the CIL if necessary. This causes serialisation of all log forces and pushes regardless of whether a force is necessary or not. As a result fsync heavy workloads (like dbench) can be significantly slower with delayed logging than without. To avoid this penalty, copy the current sequence from the context to the CIL structure when they are swapped. This allows us to do unlocked checks on the current sequence without having to worry about dereferencing context structures that may have already been freed. Hence we can remove the CIL context locking in the forcing code and only call into the push code if the current context matches the sequence we need to force. By passing the sequence into the push code, we can check the sequence again once we have the CIL lock held exclusive and abort if the sequence has already been pushed. This avoids a lock round-trip and unnecessary CIL pushes when we have racing push calls. The result is that the regression in dbench performance goes away - this change improves dbench performance on a ramdisk from ~2100MB/s to ~2500MB/s. This compares favourably to not using delayed logging which retuns ~2500MB/s for the same workload. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
73523a2e |
|
20-Jul-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: fix gcc 4.6 set but not read and unused statement warnings [hch: dropped a few hunks that need structural changes instead] Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
651701d7 |
|
28-Jun-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove incorrect log write optimization We do need a barrier for the first buffer of a split log write. Otherwise we might incorrectly stamp the tail LSN into transactions in the first part of the split write, or not flush data I/O before updating the inode size. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
9134c233 |
|
23-Jun-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove the unused XFS_LOG_SLEEP and XFS_LOG_NOSLEEP flags Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
4e0d5f92 |
|
23-Jun-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: fix the xfs_log_iovec i_addr type By making this member a void pointer we can get rid of a lot of pointless casts. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
3400777f |
|
23-Jun-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: remove unneeded #include statements Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
|
#
288699fe |
|
23-Jun-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: drop dmapi hooks Dmapi support was never merged upstream, but we still have a lot of hooks bloating XFS for it, all over the fast pathes of the filesystem. This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM support in mainline at least the namespace events can be done much saner in the VFS instead of the individual filesystem, so it's not like this is much help for future work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
9da1ab18 |
|
16-May-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: forced unmounts need to push the CIL If the filesystem is being shut down and the there is no log error, the current code forces out the current log buffers. This code now needs to push the CIL before it forces out the log buffers to acheive the same result. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
71e330b5 |
|
20-May-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: Introduce delayed logging core code The delayed logging code only changes in-memory structures and as such can be enabled and disabled with a mount option. Add the mount option and emit a warning that this is an experimental feature that should not be used in production yet. We also need infrastructure to track committed items that have not yet been written to the log. This is what the Committed Item List (CIL) is for. The log item also needs to be extended to track the current log vector, the associated memory buffer and it's location in the Commit Item List. Extend the log item and log vector structures to enable this tracking. To maintain the current log format for transactions with delayed logging, we need to introduce a checkpoint transaction and a context for tracking each checkpoint from initiation to transaction completion. This includes adding a log ticket for tracking space log required/used by the context checkpoint. To track all the changes we need an io vector array per log item, rather than a single array for the entire transaction. Using the new log vector structure for this requires two passes - the first to allocate the log vector structures and chain them together, and the second to fill them out. This log vector chain can then be passed to the CIL for formatting, pinning and insertion into the CIL. Formatting of the log vector chain is relatively simple - it's just a loop over the iovecs on each log vector, but it is made slightly more complex because we re-write the iovec after the copy to point back at the memory buffer we just copied into. This code also needs to pin log items. If the log item is not already tracked in this checkpoint context, then it needs to be pinned. Otherwise it is already pinned and we don't need to pin it again. The only other complexity is calculating the amount of new log space the formatting has consumed. This needs to be accounted to the transaction in progress, and the accounting is made more complex becase we need also to steal space from it for log metadata in the checkpoint transaction. Calculate all this at insert time and update all the tickets, counters, etc correctly. Once we've formatted all the log items in the transaction, attach the busy extents to the checkpoint context so the busy extents live until checkpoint completion and can be processed at that point in time. Transactions can then be freed at this point in time. Now we need to issue checkpoints - we are tracking the amount of log space used by the items in the CIL, so we can trigger background checkpoints when the space usage gets to a certain threshold. Otherwise, checkpoints need ot be triggered when a log synchronisation point is reached - a log force event. Because the log write code already handles chained log vectors, writing the transaction is trivial, too. Construct a transaction header, add it to the head of the chain and write it into the log, then issue a commit record write. Then we can release the checkpoint log ticket and attach the context to the log buffer so it can be called during Io completion to complete the checkpoint. We also need to allow for synchronising multiple in-flight checkpoints. This is needed for two things - the first is to ensure that checkpoint commit records appear in the log in the correct sequence order (so they are replayed in the correct order). The second is so that xfs_log_force_lsn() operates correctly and only flushes and/or waits for the specific sequence it was provided with. To do this we need a wait variable and a list tracking the checkpoint commits in progress. We can walk this list and wait for the checkpoints to change state or complete easily, an this provides the necessary synchronisation for correct operation in both cases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
955833cf |
|
14-May-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: make the log ticket ID available outside the log infrastructure The ticket ID is needed to uniquely identify transactions when doing busy extent matching. Delayed logging changes the lifecycle of busy extents with respect to the transaction structure lifecycle. Hence we can no longer use the transaction structure as a means of determining the owner of the busy extent as it may be freed and reused while the busy extent is still active. This commit provides the infrastructure to access the xlog_tid_t held in the ticket from a transaction handle. This avoids the need for callers to peek into the transaction and log structures to find this out. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
169a7b07 |
|
06-May-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: clean up log ticket overrun debug output Push the error message output when a ticket overrun is detected into the ticket printing functions. Also remove the debug version of the code as the production version will still panic just as effectively on a debug kernel via the panic mask being set. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
3383ca57 |
|
06-May-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: allow log ticket allocation to take allocation flags Delayed logging currently requires ticket allocation to succeed, so we need to be able to sleep on allocation. It also should not allow memory allocation to recurse into the filesystem. hence we need to pass allocation flags directing the type of allocation the caller requires. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
524ee36f |
|
06-May-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: Don't reuse the same transaction ID for duplicated transactions. The transaction ID is written into the log as the unique identifier for transactions during recover. When duplicating a transaction, we reuse the log ticket, which means it has the same transaction ID as the previous transaction. Rather than regenerating a random transaction ID for the duplicated transaction, just add one to the current ID so that duplicated transaction can be easily spotted in the log and during recovery during problem diagnosis. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
48389ef1 |
|
20-Apr-2010 |
Alex Elder <aelder@sgi.com> |
xfs: kill off l_sectbb_mask There remains only one user of the l_sectbb_mask field in the log structure. Just kill it off and compute the mask where needed from the power-of-2 sector size. (Only update from last post is to accomodate the changes in the previous patch in the series.) Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
69ce58f0 |
|
20-Apr-2010 |
Alex Elder <aelder@sgi.com> |
xfs: record log sector size rather than log2(that) Change struct log so it keeps track of the size (in basic blocks) of a log sector in l_sectBBsize rather than the log-base-2 of that value (previously, l_sectbb_log). The name was chosen for consistency with the other fields in the structure that represent a number of basic blocks. (Updated so that a variable used in computing and verifying a log's sector size is named "log2_size". Also added the "BB" to the structure field name, based on feedback from Eric Sandeen. Also dropped some superfluous parentheses.) Signed-off-by: Alex Elder <aelder@sgi.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
|
#
f9837107 |
|
13-Apr-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: make the log ticket transaction id random The transaction ID that is written to the log for a transaction is currently set by taking the lower 32 bits of the memory address of the ticket structure. This is not guaranteed to be unique as tickets comes from a slab and slots can be reallocated immediately after being freed. As a result, there is no guarantee of uniqueness in the ticket ID value. Fix this by assigning a random number to the ticket ID field so that it is extremely unlikely that duplicates will occur and remove the possibility of transactions being mixed up during recovery due to duplicate IDs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
e6b1f273 |
|
22-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: clean up xlog_write_adv_cnt Replace the awkward xlog_write_adv_cnt with an inline helper that makes it more obvious that it's modifying it's paramters, and replace the use of an integer type for "ptr" with a real void pointer. Also move xlog_write_adv_cnt to xfs_log_priv.h as it will be used outside of xfs_log.c in the delayed logging series. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
55b66332 |
|
22-Mar-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: introduce new internal log vector structure The current log IO vector structure is a flat array and not extensible. To make it possible to keep separate log IO vectors for individual log items, we need a method of chaining log IO vectors together. Introduce a new log vector type that can be used to wrap the existing log IO vectors on use that internally to the log. This means that the existing external interface (xfs_log_write) does not change and hence no changes to the transaction commit code are required. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
99428ad0 |
|
22-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: reindent xlog_write Reindent xlog_write to normal one tab indents and move all variable declarations into the closest enclosing block. Split from a bigger patch by Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
b5203cd0 |
|
22-Mar-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor xlog_write xlog_write is a mess that takes a lot of effort to understand. It is a mass of nested loops with 4 space indents to get it to fit in 80 columns and lots of funky variables that aren't obvious what they mean or do. Break it down into understandable chunks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
|
#
9b9fc2b7 |
|
22-Mar-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: log ticket reservation underestimates the number of iclogs When allocation a ticket for a transaction, the ticket is initialised with the worst case log space usage based on the number of bytes the transaction may consume. Part of this calculation is the number of log headers required for the iclog space used up by the transaction. This calculation makes an undocumented assumption that if the transaction uses the log header space reservation on an iclog, then it consumes either the entire iclog or it completes. That is - the transaction that is first in an iclog is the transaction that the log header reservation is accounted to. If the transaction is larger than the iclog, then it will use the entire iclog itself. Document this assumption. Further, the current calculation uses the rule that we can fit iclog_size bytes of transaction data into an iclog. This is in correct - the amount of space available in an iclog for transaction data is the size of the iclog minus the space used for log record headers. This means that the calculation is out by 512 bytes per 32k of log space the transaction can consume. This is rarely an issue because maximally sized transactions are extremely uncommon, and for 4k block size filesystems maximal transaction reservations are about 400kb. Hence the error in this case is less than the size of an iclog, so that makes it even harder to hit. However, anyone using larger directory blocks (16k directory blocks push the maximum transaction size to approx. 900k on a 4k block size filesystem) or larger block size (e.g. 64k blocks push transactions to the 3-4MB size) could see the error grow to more than an iclog and at this point the transaction is guaranteed to get a reservation underrun and shutdown the filesystem. Fix this by adjusting the calculation to calculate the correct number of iclogs required and account for them all up front. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
43f5efc5 |
|
22-Mar-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: factor log item initialisation Each log item type does manual initialisation of the log item. Delayed logging introduces new fields that need initialisation, so factor all the open coded initialisation into a common function first. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
b6f8dd49 |
|
12-Apr-2010 |
Dave Chinner <dchinner@redhat.com> |
xfs: ensure that sync updates the log tail correctly Updates to the VFS layer removed an extra ->sync_fs call into the filesystem during the sync process (from the quota code). Unfortunately the sync code was unknowingly relying on this call to make sure metadata buffers were flushed via a xfs_buftarg_flush() call to move the tail of the log forward in memory before the final transactions of the sync process were issued. As a result, the old code would write a very recent log tail value to the log by the end of the sync process, and so a subsequent crash would leave nothing for log recovery to do. Hence in qa test 182, log recovery only replayed a small handle for inode fsync transactions in this case. However, with the removal of the extra ->sync_fs call, the log tail was now not moved forward with the inode fsync transactions near the end of the sync procese the first (and only) buftarg flush occurred after these transactions went to disk. The result is that log recovery now sees a large number of transactions for metadata that is already on disk. This usually isn't a problem, but when the transactions include inode chunk allocation, the inode create transactions and all subsequent changes are replayed as we cannt rely on what is on disk is valid. As a result, if the inode was written and contains unlogged changes, the unlogged changes are lost, thereby violating sync semantics. The fix is to always issue a transaction after the buftarg flush occurs is the log iѕ not idle or covered. This results in a dummy transaction being written that contains the up-to-date log tail value, which will be very recent. Indeed, it will be at least as recent as the old code would have left on disk, so log recovery will behave exactly as it used to in this situation. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
35a8a72f |
|
15-Feb-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: stop passing opaque handles to xfs_log.c routines Currenly we pass opaque xfs_log_ticket_t handles instead of struct xlog_ticket pointers, and void pointers instead of struct xlog_in_core pointers to various log manager functions. Instead pass properly typed pointers after adding forward declarations for them to xfs_log.h, and adjust the touched function prototypes to the standard XFS style while at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
a14a348b |
|
19-Jan-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: cleanup up xfs_log_force calling conventions Remove the XFS_LOG_FORCE argument which was always set, and the XFS_LOG_URGE define, which was never used. Split xfs_log_force into a two helpers - xfs_log_force which forces the whole log, and xfs_log_force_lsn which forces up to the specified LSN. The underlying implementations already were entirely separate, as were the users. Also re-indent the new _xfs_log_force/_xfs_log_force which previously had a weird coding style. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
4139b3b3 |
|
19-Jan-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: kill XLOG_VEC_SET_TYPE This macro only obsfucates the log item type assignments, so kill it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
873ff550 |
|
13-Jan-2010 |
Christoph Hellwig <hch@infradead.org> |
xfs: clean up log buffer writes Don't bother using XFS_bwrite as it doesn't provide much code for our use case. Instead opencode it and fold xlog_bdstrat_cb into the new xlog_bdstrat helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
2ee1abad |
|
24-Nov-2009 |
Dave Chinner <dgc@sgi.com> |
xfs: improve metadata I/O merging in the elevator Change all async metadata buffers to use [READ|WRITE]_META I/O types so that the I/O doesn't get issued immediately. This allows merging of adjacent metadata requests but still prioritises them over bulk data. This shows a 10-15% improvement in sequential create speed of small files. Don't include the log buffers in this classification - leave them as sync types so they are issued immediately. Signed-off-by: Dave Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
0b1b213f |
|
14-Dec-2009 |
Christoph Hellwig <hch@infradead.org> |
xfs: event tracing support Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile | 8 fs/xfs/linux-2.6/xfs_acl.c | 1 fs/xfs/linux-2.6/xfs_aops.c | 52 - fs/xfs/linux-2.6/xfs_aops.h | 2 fs/xfs/linux-2.6/xfs_buf.c | 117 +-- fs/xfs/linux-2.6/xfs_buf.h | 33 fs/xfs/linux-2.6/xfs_fs_subr.c | 3 fs/xfs/linux-2.6/xfs_ioctl.c | 1 fs/xfs/linux-2.6/xfs_ioctl32.c | 1 fs/xfs/linux-2.6/xfs_iops.c | 1 fs/xfs/linux-2.6/xfs_linux.h | 1 fs/xfs/linux-2.6/xfs_lrw.c | 87 -- fs/xfs/linux-2.6/xfs_lrw.h | 45 - fs/xfs/linux-2.6/xfs_super.c | 104 --- fs/xfs/linux-2.6/xfs_super.h | 7 fs/xfs/linux-2.6/xfs_sync.c | 1 fs/xfs/linux-2.6/xfs_trace.c | 75 ++ fs/xfs/linux-2.6/xfs_trace.h | 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h | 4 fs/xfs/quota/xfs_dquot.c | 110 --- fs/xfs/quota/xfs_dquot.h | 21 fs/xfs/quota/xfs_qm.c | 40 - fs/xfs/quota/xfs_qm_syscalls.c | 4 fs/xfs/support/ktrace.c | 323 --------- fs/xfs/support/ktrace.h | 85 -- fs/xfs/xfs.h | 16 fs/xfs/xfs_ag.h | 14 fs/xfs/xfs_alloc.c | 230 +----- fs/xfs/xfs_alloc.h | 27 fs/xfs/xfs_alloc_btree.c | 1 fs/xfs/xfs_attr.c | 107 --- fs/xfs/xfs_attr.h | 10 fs/xfs/xfs_attr_leaf.c | 14 fs/xfs/xfs_attr_sf.h | 40 - fs/xfs/xfs_bmap.c | 507 +++------------ fs/xfs/xfs_bmap.h | 49 - fs/xfs/xfs_bmap_btree.c | 6 fs/xfs/xfs_btree.c | 5 fs/xfs/xfs_btree_trace.h | 17 fs/xfs/xfs_buf_item.c | 87 -- fs/xfs/xfs_buf_item.h | 20 fs/xfs/xfs_da_btree.c | 3 fs/xfs/xfs_da_btree.h | 7 fs/xfs/xfs_dfrag.c | 2 fs/xfs/xfs_dir2.c | 8 fs/xfs/xfs_dir2_block.c | 20 fs/xfs/xfs_dir2_leaf.c | 21 fs/xfs/xfs_dir2_node.c | 27 fs/xfs/xfs_dir2_sf.c | 26 fs/xfs/xfs_dir2_trace.c | 216 ------ fs/xfs/xfs_dir2_trace.h | 72 -- fs/xfs/xfs_filestream.c | 8 fs/xfs/xfs_fsops.c | 2 fs/xfs/xfs_iget.c | 111 --- fs/xfs/xfs_inode.c | 67 -- fs/xfs/xfs_inode.h | 76 -- fs/xfs/xfs_inode_item.c | 5 fs/xfs/xfs_iomap.c | 85 -- fs/xfs/xfs_iomap.h | 8 fs/xfs/xfs_log.c | 181 +---- fs/xfs/xfs_log_priv.h | 20 fs/xfs/xfs_log_recover.c | 1 fs/xfs/xfs_mount.c | 2 fs/xfs/xfs_quota.h | 8 fs/xfs/xfs_rename.c | 1 fs/xfs/xfs_rtalloc.c | 1 fs/xfs/xfs_rw.c | 3 fs/xfs/xfs_trans.h | 47 + fs/xfs/xfs_trans_buf.c | 62 - fs/xfs/xfs_vnodeops.c | 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
a8914f3a |
|
10-Aug-2009 |
Christoph Hellwig <hch@lst.de> |
xfs: fix spin_is_locked assert on uni-processor builds Without SMP or preemption spin_is_locked always returns false, so we can't do an assert with it. Instead use assert_spin_locked, which does the right thing on all builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reported-by: Johannes Engel <jcnengel@googlemail.com> Tested-by: Johannes Engel <jcnengel@googlemail.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
|
#
79dd43bb |
|
10-Aug-2009 |
Christoph Hellwig <hch@lst.de> |
xfs: fix spin_is_locked assert on uni-processor builds Without SMP or preemption spin_is_locked always returns false, so we can't do an assert with it. Instead use assert_spin_locked, which does the right thing on all builds. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reported-by: Johannes Engel <jcnengel@googlemail.com> Tested-by: Johannes Engel <jcnengel@googlemail.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
|
#
9d7fef74 |
|
06-Apr-2009 |
Dave Chinner <david@fromorbit.com> |
xfs: inform the xfsaild of the push target before sleeping When trying to reserve log space, we find the amount of space we need, then go to sleep waiting for space. When we are woken, we try to push the tail of the log forward to make sure we have space available. Unfortunately, this means that if there is not space available, and everyone who needs space goes to sleep there is no-one left to push the tail of the log to make space available. Once we have a thread waiting for space to become available, the others queue up behind it in a FIFO, and none of them push the tail of the log. This can result in everyone going to sleep in xlog_grant_log_space() if the first sleeper races with the last I/O that moves the tail of the log forward. With no further I/O tomove the tail of the log, there is nothing to wake the sleepers and hence all transactions just stop. Fix this by making sure the xfsaild will create enough space for the transaction that is about to sleep by moving the push target far enough forwards to ensure that that the curent proceeees will have enough space available when it is woken. That is, we push the AIL before we go to sleep. Because we've inserted the log ticket into the queue before we've pushed and gone to sleep, subsequent transactions will wait behind this one. Hence we are guaranteed to have space available when we are woken. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
a6cb767e |
|
06-Apr-2009 |
Dave Chinner <david@fromorbit.com> |
xfs: validate log feature fields correctly If the large log sector size feature bit is set in the superblock by accident (say disk corruption), the then fields that are now considered valid are not checked on production kernels. The checks are present as ASSERT statements so cause a panic on a debug kernel. Change this so that the fields are validity checked if the feature bit is set and abort the log mount if the fields do not contain valid values. Reported-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
9da096fd |
|
29-Mar-2009 |
Malcolm Parsons <malcolm.parsons@gmail.com> |
xfs: fix various typos Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
21b699c8 |
|
16-Mar-2009 |
Christoph Hellwig <hch@lst.de> |
xfs: cleanup log unmount handling Kill the current xfs_log_unmount wrapper and opencode the two function calls in the only caller. Rename the current xfs_log_unmount_dealloc to xfs_log_unmount as it undoes xfs_log_mount and the new name makes that more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
|
#
26430752 |
|
12-Feb-2009 |
Christoph Hellwig <hch@lst.de> |
xfs: fix error handling in xfs_log_mount We can't just call xfs_log_unmount_dealloc on any failure because the ail thread which is torn down by xfs_log_unmount_dealloc might not be initialized yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reported-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
7153f8ba |
|
09-Feb-2009 |
Christoph Hellwig <hch@lst.de> |
xfs: remove iclog calculation special cases Our default has been to always use 8 32KB log buffers for a while now, so remove the special casing for larger block size filesystem to use the same or even lower number of buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com>
|
#
39e2defe |
|
02-Dec-2008 |
Christoph Hellwig <hch@lst.de> |
reduce l_icloglock roundtrips All but one caller of xlog_state_want_sync drop and re-acquire l_icloglock around the call to it, just so that xlog_state_want_sync can acquire and drop it. Move all lock operation out of l_icloglock and assert that the lock is held when it is called. Note that it would make sense to extende this scheme to xlog_state_release_iclog, but the locking in there is more complicated and we'd like to keep the atomic_dec_and_lock optmization for those callers not having l_icloglock yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>
|
#
b28708d6 |
|
27-Nov-2008 |
Christoph Hellwig <hch@lst.de> |
[XFS] sanitize xlog_in_core_t definition Move all fields from xlog_iclog_fields_t into xlog_in_core_t instead of having them in a substructure and the using #defines to make it look like they were directly in xlog_in_core_t. Also document that xlog_in_core_2_t is grossly misnamed, and make all references to it typesafe. (First sent on Semptember 15th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>
|
#
bac8dca9 |
|
27-Nov-2008 |
Christoph Hellwig <hch@infradead.org> |
[XFS] fix NULL pointer dereference in xfs_log_force_umount xfs_log_force_umount may be called very early during log recovery where If we fail a buffer read in xlog_recover_do_inode_trans we abort the mount. But at that point log recovery has started delayed writeback of inode buffers. As part of the aborted mount we try to flush out all delwri buffers, but at that point we have already freed the superblock, and set mp->m_sb_bp to NULL, and xfs_log_force_umount which gets called after the inode buffer writeback trips over it. Make xfs_log_force_umount a little more careful when accessing mp->m_sb_bp to avoid this. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Niv Sardi <xaiki@sgi.com>
|
#
cc09c0dc |
|
16-Nov-2008 |
Dave Chinner <david@fromorbit.com> |
[XFS] Fix double free of log tickets When an I/O error occurs during an intermediate commit on a rolling transaction, xfs_trans_commit() will free the transaction structure and the related ticket. However, the duplicate transaction that gets used as the transaction continues still contains a pointer to the ticket. Hence when the duplicate transaction is cancelled and freed, we free the ticket a second time. Add reference counting to the ticket so that we hold an extra reference to the ticket over the transaction commit. We drop the extra reference once we have checked that the transaction commit did not return an error, thus avoiding a double free on commit error. Credit to Nick Piggin for tripping over the problem. SGI-PV: 989741 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
8f330f51 |
|
09-Nov-2008 |
Dave Chinner <david@fromorbit.com> |
[XFS] handle memory allocation failures during log initialisation When there is no memory left in the system, xfs_buf_get_noaddr() can fail. If this happens at mount time during xlog_alloc_log() we fail to catch the error and oops. Catch the error from xfs_buf_get_noaddr(), and allow other memory allocations to fail and catch those errors too. Report the error to the console and fail the mount with ENOMEM. Tested by manually injecting errors into xfs_buf_get_noaddr() and xlog_alloc_log(). Version 2: o remove unnecessary casts of the returned pointer from kmem_zalloc() SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
644c3567 |
|
09-Nov-2008 |
Dave Chinner <david@fromorbit.com> |
[XFS] handle memory allocation failures during log initialisation When there is no memory left in the system, xfs_buf_get_noaddr() can fail. If this happens at mount time during xlog_alloc_log() we fail to catch the error and oops. Catch the error from xfs_buf_get_noaddr(), and allow other memory allocations to fail and catch those errors too. Report the error to the console and fail the mount with ENOMEM. Tested by manually injecting errors into xfs_buf_get_noaddr() and xlog_alloc_log(). Version 2: o remove unnecessary casts of the returned pointer from kmem_zalloc() SGI-PV: 987246 Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
783a2f65 |
|
30-Oct-2008 |
David Chinner <david@fromorbit.com> |
[XFS] Finish removing the mount pointer from the AIL API Change all the remaining AIL API functions that are passed struct xfs_mount pointers to pass pointers directly to the struct xfs_ail being used. With this conversion, all external access to the AIL is via the struct xfs_ail. Hence the operation and referencing of the AIL is almost entirely independent of the xfs_mount that is using it - it is now much more tightly tied to the log and the items it is tracking in the log than it is tied to the xfs_mount. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32353a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
#
a9c21c1b |
|
30-Oct-2008 |
David Chinner <david@fromorbit.com> |
[XFS] Given the log a pointer to the AIL When we need to go from the log to the AIL, we have to go via the xfs_mount. Add a xfs_ail pointer to the log so we can go directly to the AIL associated with the log. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32351a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
#
c7e8f268 |
|
30-Oct-2008 |
David Chinner <david@fromorbit.com> |
[XFS] Move the AIL lock into the struct xfs_ail Bring the ail lock inside the struct xfs_ail. This means the AIL can be entirely manipulated via the struct xfs_ail rather than needing both the struct xfs_mount and the struct xfs_ail. SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32350a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
#
5b00f14f |
|
30-Oct-2008 |
David Chinner <david@fromorbit.com> |
[XFS] move the AIl traversal over to a consistent interface With the new cursor interface, it makes sense to make all the traversing code use the cursor interface and make the old one go away. This means more of the AIL interfacing is done by passing struct xfs_ail pointers around the place instead of struct xfs_mount pointers. We can replace the use of xfs_trans_first_ail() in xfs_log_need_covered() as it is only checking if the AIL is empty. We can do that with a call to xfs_trans_ail_tail() instead, where a zero LSN returned indicates and empty AIL... SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32348a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
#
27d8d5fe |
|
30-Oct-2008 |
David Chinner <david@fromorbit.com> |
[XFS] Use a cursor for AIL traversal. To replace the current generation number ensuring sanity of the AIL traversal, replace it with an external cursor that is linked to the AIL. Basically, we store the next item in the cursor whenever we want to drop the AIL lock to do something to the current item. When we regain the lock. the current item may already be free, so we can't reference it, but the next item in the traversal is already held in the cursor. When we move or delete an object, we search all the active cursors and if there is an item match we clear the cursor(s) that point to the object. This forces the traversal to restart transparently. We don't invalidate the cursor on insert because the cursor still points to a valid item. If the intem is inserted between the current item and the cursor it does not matter; the traversal is considered to be past the insertion point so it will be picked up in the next traversal. Hence traversal restarts pretty much disappear altogether with this method of traversal, which should substantially reduce the overhead of pushing on a busy AIL. Version 2 o add restart logic o comment cursor interface o minor cleanups SGI-PV: 988143 SGI-Modid: xfs-linux-melb:xfs-kern:32347a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
#
73f6aa4d |
|
10-Oct-2008 |
Christoph Hellwig <hch@lst.de> |
Fix barrier fail detection in XFS Currently we disable barriers as soon as we get a buffer in xlog_iodone that has the XBF_ORDERED flag cleared. But this can be the case not only for buffers where the barrier failed, but also the first buffer of a split log write in case of a log wraparound. Due to the disabled barriers we can easily get directory corruption on unclean shutdowns. So instead of using this check add a new buffer flag for failed barrier writes. This is a regression vs 2.6.26 caused by patch to use the right macro to check for the ORDERED flag, as we previously got true returned for every buffer. Thanks to Toei Rei for reporting the bug. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: David Chinner <david@fromorbit.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b5b8c9ac |
|
17-Sep-2008 |
David Chinner <david@fromorbit.com> |
[XFS] Fix barrier status change detection. The current code in xlog_iodone() uses the wrong macro to check if the barrier has been cleared due to an EOPNOTSUPP error form the lower layer. SGI-PV: 986143 SGI-Modid: xfs-linux-melb:xfs-kern:31984a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Nathaniel W. Turner <nate@houseofnate.net> Signed-off-by: Peter Leckie <pleckie@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
31bd61f2 |
|
17-Sep-2008 |
Lachlan McIlroy <lachlan@sgi.com> |
[XFS] Move memory allocations for log tracing out of the critical path Memory allocations for log->l_grant_trace and iclog->ic_trace are done on demand when the first event is logged. In xlog_state_get_iclog_space() we call xlog_trace_iclog() under a spinlock and allocating memory here can cause us to sleep with a spinlock held and deadlock the system. For the log grant tracing we use KM_NOSLEEP but that means we can lose trace entries. Since there is no locking to serialize the log grant tracing we could race and have multiple allocations and leak memory. So move the allocations to where we initialize the log/iclog structures. Use KM_NOFS to avoid recursing into the filesystem and drop log->l_trace since it's not even used. SGI-PV: 983738 SGI-Modid: xfs-linux-melb:xfs-kern:31896a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>
|
#
c6a7b0f8 |
|
13-Aug-2008 |
Lachlan McIlroy <lachlan@sgi.com> |
[XFS] Fix use after free in xfs_log_done(). The ticket allocation code got reworked in 2.6.26 and we now free tickets whereas before we used to cache them so the use-after-free went undetected. SGI-PV: 985525 SGI-Modid: xfs-linux-melb:xfs-kern:31877a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
|
#
5695ef46 |
|
13-Aug-2008 |
Lachlan McIlroy <lachlan@sgi.com> |
[XFS] Use KM_NOFS for debug trace buffers Use KM_NOFS to prevent recursion back into the filesystem which can cause deadlocks. In the case of xfs_iread() we hold the lock on the inode cluster buffer while allocating memory for the trace buffers. If we recurse back into XFS to flush data that may require a transaction to allocate extents which needs log space. This can deadlock with the xfsaild thread which can't push the tail of the log because it is trying to get the inode cluster buffer lock. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31838a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>
|
#
4249023a |
|
13-Aug-2008 |
Christoph Hellwig <hch@infradead.org> |
[XFS] cleanup xfs_mountfs Remove all the useless flags and code keyed off it in xfs_mountfs. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31831a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
12017faf |
|
13-Aug-2008 |
David Chinner <david@fromorbit.com> |
[XFS] clean up stale references to semaphores A lot of code has been converted away from semaphores, but there are still comments that reference semaphore behaviour. The log code is the worst offender. Update the comments to reflect what the code really does now. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31814a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
d748c623 |
|
19-May-2008 |
Matthew Wilcox <willy@infradead.org> |
[XFS] Convert l_flushsema to a sv_t The l_flushsema doesn't exactly have completion semantics, nor mutex semantics. It's used as a list of tasks which are waiting to be notified that a flush has completed. It was also being used in a way that was potentially racy, depending on the semaphore implementation. By using a sv_t instead of a semaphore we avoid the need for a separate counter, since we know we just need to wake everything on the queue. Original waitqueue implementation from Matthew Wilcox. Cleanup and conversion to sv_t by Christoph Hellwig. SGI-PV: 981507 SGI-Modid: xfs-linux-melb:xfs-kern:31059a Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
d729eae8 |
|
19-May-2008 |
Michael Nishimoto <miken@agami.com> |
[XFS] Ensure that 2 GiB xfs logs work properly. We found this while experimenting with 2GiB xfs logs. The previous code never assumed that xfs logs would ever get so large. SGI-PV: 981502 SGI-Modid: xfs-linux-melb:xfs-kern:31058a Signed-off-by: Michael Nishimoto <miken@agami.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
f0e2d93c |
|
19-May-2008 |
Denys Vlasenko <vda.linux@googlemail.com> |
[XFS] Remove unused arg from kmem_free() kmem_free() function takes (ptr, size) arguments but doesn't actually use second one. This patch removes size argument from all callsites. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31050a Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
49641f1a |
|
11-Jul-2008 |
Dave Chinner <david@fromorbit.com> |
Fix reference counting race on log buffers When we release the iclog, we do an atomic_dec_and_lock to determine if we are the last reference and need to trigger update of log headers and writeout. However, in xlog_state_get_iclog_space() we also need to check if we have the last reference count there. If we do, we release the log buffer, otherwise we decrement the reference count. But the compare and decrement in xlog_state_get_iclog_space() is not atomic, so both places can see a reference count of 2 and neither will release the iclog. That leads to a filesystem hang. Close the race by replacing the atomic_read() and atomic_dec() pair with atomic_add_unless() to ensure that they are executed atomically. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Tim Shimmin <tes@sgi.com> Tested-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1bb7d6b5 |
|
09-Apr-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Catch log unmount failures. Unmounting the log can fail. unlikely, but it can. Catch all the error conditions an make sure it's propagated upwards. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30833a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
b911ca04 |
|
09-Apr-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Sanitise xfs_log_force error checking. xfs_log_force() is declared to return an error, but we almost never check it. We don't need to check it in most cases; if there's a log I/O error then we'll be shutting down the filesystem anyway and that means we'll catch the error somewhere else. However, on certain calls we should be returning an error - sync transactions, fsync, sync writes, etc. so this isn't a pure black and white distinction. Hence make xfs_log_force() a void function that issues a warning to the syslog on error, and call _xfs_log_force() in all the places where we actually care about the error status returned. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30832a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
34a622b2 |
|
09-Apr-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
[XFS] replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30775a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
6b1d1a73 |
|
09-Apr-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Fix lock inversion in forced shutdown. Recent changes to xlog_state_release_iclog() placed the grant_lock inside the icloglock. forced unmount of the log does this the opposite way around, but does not depend on the order for correct working. Fix the inversion by changing the order locks are gained in xfs_log_force_umount(). SGI-PV: 979661 SGI-Modid: xfs-linux-melb:xfs-kern:30773a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
4679b2d3 |
|
09-Apr-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Reorganise xlog_t for better cacheline isolation of contention To reduce contention on the log in large CPU count, separate out different parts of the xlog_t structure onto different cachelines. Move each lock onto a different cacheline along with all the members that are accessed/modified while that lock is held. Also, move the debugging code into debug code. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30772a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
eb01c9cd |
|
09-Apr-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Remove the xlog_ticket allocator The ticket allocator is just a simple slab implementation internal to the log. It requires the icloglock to be held when manipulating it and this contributes to contention on that lock. Just kill the entire allocator and use a memory zone instead. While there, allow us to gracefully fail allocation with ENOMEM. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30771a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
114d23aa |
|
09-Apr-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Per iclog callback chain lock Rather than use the icloglock for protecting the iclog completion callback chain, use a new per-iclog lock so that walking the callback chain doesn't require holding a global lock. This reduces contention on the icloglock during transaction commit and log I/O completion by reducing the number of times we need to hold the global icloglock during these operations. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30770a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
155cc6b7 |
|
05-Mar-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Use atomics for iclog reference counting Now that we update the log tail LSN less frequently on transaction completion, we pass the contention straight to the global log state lock (l_iclog_lock) during transaction completion. We currently have to take this lock to decrement the iclog reference count. there is a reference count on each iclog, so we need to take �he global lock for all refcount changes. When large numbers of processes are all doing small trnasctions, the iclog reference counts will be quite high, and the state change that absolutely requires the l_iclog_lock is the except rather than the norm. Change the reference counting on the iclogs to use atomic_inc/dec so that we can use atomic_dec_and_lock during transaction completion and avoid the need for grabbing the l_iclog_lock for every reference count decrement except the one that matters - the last. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30505a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
b589334c |
|
05-Mar-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Prevent AIL lock contention during transaction completion When hundreds of processors attempt to commit transactions at the same time, they can contend on the AIL lock when updating the tail LSN held in the in-core log structure. At the moment, the tail LSN is only needed when actually writing out an iclog, so it really does not need to be updated on every single transaction completion - only those that result in switching iclogs and flushing them to disk. The result is that we reduce the number of times we need to grab the AIL lock and the log grant lock by up to two orders of magnitude on large processor count machines. The problem has previously been hidden by AIL lock contention walking the AIL list which was recently solved and uncovered this issue. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30504a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
62118709 |
|
05-Mar-2008 |
Eric Sandeen <sandeen@sandeen.net> |
[XFS] remove shouting-indirection macros from xfs_sb.h Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
413d57c9 |
|
13-Feb-2008 |
Marcin Slusarz <marcin.slusarz@gmail.com> |
xfs: convert beX_add to beX_add_cpu (new common API) remove beX_add functions and replace all uses with beX_add_cpu Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Dave Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
249a8c11 |
|
04-Feb-2008 |
David Chinner <dgc@sgi.com> |
[XFS] Move AIL pushing into it's own thread When many hundreds to thousands of threads all try to do simultaneous transactions and the log is in a tail-pushing situation (i.e. full), we can get multiple threads walking the AIL list and contending on the AIL lock. The AIL push is, in effect, a simple I/O dispatch algorithm complicated by the ordering constraints placed on it by the transaction subsystem. It really does not need multiple threads to push on it - even when only a single CPU is pushing the AIL, it can push the I/O out far faster that pretty much any disk subsystem can handle. So, to avoid contention problems stemming from multiple list walkers, move the list walk off into another thread and simply provide a "target" to push to. When a thread requires a push, it sets the target and wakes the push thread, then goes to sleep waiting for the required amount of space to become available in the log. This mechanism should also be a lot fairer under heavy load as the waiters will queue in arrival order, rather than queuing in "who completed a push first" order. Also, by moving the pushing to a separate thread we can do more effectively overload detection and prevention as we can keep context from loop iteration to loop iteration. That is, we can push only part of the list each loop and not have to loop back to the start of the list every time we run. This should also help by reducing the number of items we try to lock and/or push items that we cannot move. Note that this patch is not intended to solve the inefficiencies in the AIL structure and the associated issues with extremely large list contents. That needs to be addresses separately; parallel access would cause problems to any new structure as well, so I'm only aiming to isolate the structure from unbounded parallelism here. SGI-PV: 972759 SGI-Modid: xfs-linux-melb:xfs-kern:30371a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
e6a4b37f |
|
22-Nov-2007 |
Tim Shimmin <tes@sgi.com> |
[XFS] Remove the BPCSHIFT and NB* based macros from XFS. The BPCSHIFT based macros, btoc*, ctob*, offtoc* and ctooff are either not used or don't need to be used. The NDPP, NDPP, NBBY macros don't need to be used but instead are replaced directly by PAGE_SIZE and PAGE_CACHE_SIZE where appropriate. Initial patch and motivation from Nicolas Kaiser. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30096a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
f7b7c367 |
|
26-Nov-2007 |
Niv Sardi <xaiki@sgi.com> |
[XFS] Remove bogus assert This assert is bogus. We can have a forced shutdown occur between the check for the XLOG_FORCED_SHUTDOWN and the ASSERT. Also, the logging system shouldn't care about the state of XFS_FORCED_SHUTDOWN, it should only check XLOG_FORCED_SHUTDOWN. The logging system has it's own forced shutdown flag so, for the case of a forced shutdown that's not due to a logging error, we can flush the log. SGI-PV: 972985 SGI-Modid: xfs-linux-melb:xfs-kern:30029a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
a8272ce0 |
|
22-Nov-2007 |
David Chinner <dgc@sgi.com> |
[XFS] Fix up sparse warnings. These are mostly locking annotations, marking things static, casts where needed and declaring stuff in header files. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30002a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
|
#
b53e675d |
|
11-Oct-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] xlog_rec_header/xlog_rec_ext_header endianess annotations Mostly trivial conversion with one exceptions: h_num_logops was kept in native endian previously and only converted to big endian in xlog_sync, but we always keep it big endian now. With todays cpus fast byteswap instructions that's not an issue but the new variant keeps the code clean and maintainable. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29821a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
67fcb7bf |
|
11-Oct-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29820a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
03bea6fe |
|
11-Oct-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] clean up some xfs_log_priv.h macros - the various assign lsn macros are replaced by a single inline, xlog_assign_lsn, which is equivalent to ASSIGN_ANY_LSN_HOST except for a more sane calling convention. ASSIGN_LSN_DISK is replaced by xlog_assign_lsn and a manual bytespap, and ASSIGN_LSN by the same, except we pass the cycle and block arguments explicitly instead of a log paramter. The latter two variants only had 2, respectively one user anyway. - the GET_CYCLE is replaced by a xlog_get_cycle inline with exactly the same calling conventions. - GET_CLIENT_ID is replaced by xlog_get_client_id which leaves away the unused arch argument. Instead of conditional defintions depending on host endianess we now do an unconditional swap and shift then, which generates equal code. - the unused XLOG_SET macro is removed. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29819a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
007c61c6 |
|
11-Oct-2007 |
Eric Sandeen <sandeen@sandeen.net> |
[XFS] Remove spin.h remove spinlock init abstraction macro in spin.h, remove the callers, and remove the file. Move no-op spinlock_destroy to xfs_linux.h Cleanup spinlock locals in xfs_mount.c SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29751a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
c8b5ea28 |
|
11-Oct-2007 |
Eric Sandeen <sandeen@sandeen.net> |
[XFS] Unwrap GRANT_LOCK. Un-obfuscate GRANT_LOCK, remove GRANT_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29741a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
b22cd72c |
|
11-Oct-2007 |
Eric Sandeen <sandeen@sandeen.net> |
[XFS] Unwrap LOG_LOCK. Un-obfuscate LOG_LOCK, remove LOG_LOCK->mutex_lock->spin_lock macros, call spin_lock directly, remove extraneous cookie holdover from old xfs code, and change lock type to spinlock_t. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29740a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
0adba536 |
|
30-Aug-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] replace some large xfs_log_priv.h macros by proper functions ... or in the case of XLOG_TIC_ADD_OPHDR remove a useless macro entirely. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29511a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
bd186aa9 |
|
30-Aug-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] kill the vfs_flags member in struct bhv_vfs All flags are added to xfs_mount's m_flag instead. Note that the 32bit inode flag was duplicated in both of them, but only cleared in the mount when it was not nessecary due to the filesystem beeing small enough. Two flags are still required here - one to indicate the mount option setting, and one to indicate if it applies or not. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29507a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
1cb51258 |
|
16-Aug-2007 |
Eric Sandeen <sandeen@sandeen.net> |
[XFS] choose single default logbuf count & size Remove sizing of logbuf size & count based on physical memory; this was never a very good gauge as it's looking at global memory, but deciding on sizing per-filesystem; no account is made of the total number of filesystems, for example. For now just take the largest "default" case, as was set for machines with >400MB - 8 x 32k buffers. This can always be tuned higher or lower with mount options if necessary. Removes one more user of xfs_physmem. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29323a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
0bfefc46 |
|
14-May-2007 |
David Chinner <dgc@sgi.com> |
[XFS] Barriers need to be dynamically checked and switched off If the underlying block device suddenly stops supporting barriers, we need to handle the -EOPNOTSUPP error in a sane manner rather than shutting down the filesystem. If we get this error, clear the barrier flag, reissue the I/O, and tell the world bad things are occurring. SGI-PV: 964544 SGI-Modid: xfs-linux-melb:xfs-kern:28568a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
4b80916b |
|
15-Aug-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] Fix sparse NULL vs 0 warnings Sparse now warns about comparing pointers to 0, so change all instance where that happens to NULL instead. SGI-PV: 968555 SGI-Modid: xfs-linux-melb:xfs-kern:29308a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
92821e2b |
|
23-May-2007 |
David Chinner <dgc@sgi.com> |
[XFS] Lazy Superblock Counters When we have a couple of hundred transactions on the fly at once, they all typically modify the on disk superblock in some way. create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify free block counts. When these counts are modified in a transaction, they must eventually lock the superblock buffer and apply the mods. The buffer then remains locked until the transaction is committed into the incore log buffer. The result of this is that with enough transactions on the fly the incore superblock buffer becomes a bottleneck. The result of contention on the incore superblock buffer is that transaction rates fall - the more pressure that is put on the superblock buffer, the slower things go. The key to removing the contention is to not require the superblock fields in question to be locked. We do that by not marking the superblock dirty in the transaction. IOWs, we modify the incore superblock but do not modify the cached superblock buffer. In short, we do not log superblock modifications to critical fields in the superblock on every transaction. In fact we only do it just before we write the superblock to disk every sync period or just before unmount. This creates an interesting problem - if we don't log or write out the fields in every transaction, then how do the values get recovered after a crash? the answer is simple - we keep enough duplicate, logged information in other structures that we can reconstruct the correct count after log recovery has been performed. It is the AGF and AGI structures that contain the duplicate information; after recovery, we walk every AGI and AGF and sum their individual counters to get the correct value, and we do a transaction into the log to correct them. An optimisation of this is that if we have a clean unmount record, we know the value in the superblock is correct, so we can avoid the summation walk under normal conditions and so mount/recovery times do not change under normal operation. One wrinkle that was discovered during development was that the blocks used in the freespace btrees are never accounted for in the AGF counters. This was once a valid optimisation to make; when the filesystem is full, the free space btrees are empty and consume no space. Hence when it matters, the "accounting" is correct. But that means the when we do the AGF summations, we would not have a correct count and xfs_check would complain. Hence a new counter was added to track the number of blocks used by the free space btrees. This is an *on-disk format change*. As a result of this, lazy superblock counters are a mkfs option and at the moment on linux there is no way to convert an old filesystem. This is possible - xfs_db can be used to twiddle the right bits and then xfs_repair will do the format conversion for you. Similarly, you can convert backwards as well. At some point we'll add functionality to xfs_admin to do the bit twiddling easily.... SGI-PV: 964999 SGI-Modid: xfs-linux-melb:xfs-kern:28652a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
511105b3 |
|
23-May-2007 |
David Chinner <dgc@sgi.com> |
[XFS] Fix vmalloc leak on mount/unmount. When setting the length of the iclogbuf to write out we should just be changing the desired byte count rather completely reassociating the buffer memory with the buffer. Reassociating the buffer memory changes the apparent length of the buffer and hence when we free the buffer, we don't free all the vmap()d space we originally allocated. SGI-PV: 964983 SGI-Modid: xfs-linux-melb:xfs-kern:28640a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
3db296f3 |
|
14-May-2007 |
David Chinner <dgc@sgi.com> |
[XFS] Fix use-after-free during log unmount. Don't reference the log buffer after running the callbacks as the callback can trigger the log buffers to be freed during unmount. SGI-PV: 964545 SGI-Modid: xfs-linux-melb:xfs-kern:28567a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
1fa40b01 |
|
14-May-2007 |
Christoph Hellwig <hch@infradead.org> |
[XFS] Only use refcounted pages for I/O Many block drivers (aoe, iscsi) really want refcountable pages in bios, which is what almost everyone send down. XFS unfortunately has a few places where it sends down buffers that may come from kmalloc, which breaks them. Fix the places that use kmalloc()d buffers. SGI-PV: 964546 SGI-Modid: xfs-linux-melb:xfs-kern:28562a Signed-Off-By: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
955e47ad |
|
27-Sep-2006 |
Tim Shimmin <tes@sgi.com> |
[XFS] Fixes the leak in reservation space because we weren't ungranting space for the unmount record - which becomes a problem in the freeze/thaw scenario. SGI-PV: 942533 SGI-Modid: xfs-linux-melb:xfs-kern:26815a Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
efb8ad7e |
|
27-Sep-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Add a debug flag for allocations which are known to be larger than one page. SGI-PV: 955302 SGI-Modid: xfs-linux-melb:xfs-kern:26800a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
a3c6685e |
|
27-Sep-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Ensure xlog_state_do_callback does not report spurious warnings on ramdisks. SGI-PV: 954802 SGI-Modid: xfs-linux-melb:xfs-kern:26627a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>
|
#
f5faad79 |
|
28-Jul-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Fix remount vs no/barrier options by ensuring we clear unwanted flags from iclog buffers before submitting them for writing. SGI-PV: 954772 SGI-Modid: xfs-linux-melb:xfs-kern:26605a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
5493a0fc |
|
27-Jun-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Fixup whitespace damage in log_write, remove final warning. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:26366a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
f6c2d1fa |
|
19-Jun-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Remove version 1 directory code. Never functioned on Linux, just pure bloat. SGI-PV: 952969 SGI-Modid: xfs-linux-melb:xfs-kern:26251a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
34327e13 |
|
09-Jun-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Cleanup a missed porting conversion, and freezing. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26109a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
b83bd138 |
|
09-Jun-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters. SGI-PV: 9533338 SGI-Modid: xfs-linux-melb:xfs-kern:26106a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
7d04a335 |
|
08-Jun-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] Shutdown the filesystem if all device paths have gone. Made shutdown vop flags consistent with sync vop flags declarations too. SGI-PV: 939911 SGI-Modid: xfs-linux-melb:xfs-kern:26096a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
c41564b5 |
|
28-Mar-2006 |
Nathan Scott <nathans@sgi.com> |
[XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all these typos. SGI-PV: 904196 SGI-Modid: xfs-linux-melb:xfs-kern:25539a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
014c2544 |
|
14-Jan-2006 |
Jesper Juhl <juhl-lkml@dif.dk> |
return statement cleanup - kill pointless parentheses This patch removes pointless parentheses from return statements. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
1259845d |
|
11-Jan-2006 |
Tim Shimmin <tes@sgi.com> |
[XFS] remove XFS_LOG_RES_DEBUG and turn on the res history all the time to get more useful error info on space for trans items SGI-PV: 947110 SGI-Modid: xfs-linux-melb:xfs-kern:24886a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
dd954c69 |
|
10-Jan-2006 |
Christoph Hellwig <hch@sgi.com> |
[XFS] turn xlog helper macros into real functions SGI-PV: 946205 SGI-Modid: xfs-linux-melb:xfs-kern:203360a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
65be6054 |
|
10-Jan-2006 |
Eric Sandeen <sandeen@sgi.com> |
[XFS] remove unused "readonly" arg from xlog_find_tail and xlog_recover SGI-PV: 946611 SGI-Modid: xfs-linux-melb:xfs-kern:203307a Signed-off-by: Eric Sandeen <sandeen@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
cfcbbbd0 |
|
01-Nov-2005 |
Nathan Scott <nathans@sgi.com> |
[XFS] Remove old, broken nolog-mode code - noone plans to ever fix it. SGI-PV: 944821 SGI-Modid: xfs-linux:xfs-kern:24213a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
7b718769 |
|
01-Nov-2005 |
Nathan Scott <nathans@sgi.com> |
[XFS] Update license/copyright notices to match the prefered SGI boilerplate. SGI-PV: 913862 SGI-Modid: xfs-linux:xfs-kern:23903a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
a844f451 |
|
01-Nov-2005 |
Nathan Scott <nathans@sgi.com> |
[XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot. SGI-PV: 943122 SGI-Modid: xfs-linux:xfs-kern:23901a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
fc1f8c1c |
|
01-Nov-2005 |
Nathan Scott <nathans@sgi.com> |
[XFS] Track external log/realtime device names for correct reporting in /proc/mounts. SGI-PV: 942984 SGI-Modid: xfs-linux:xfs-kern:23862a Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
f538d4da |
|
01-Nov-2005 |
Christoph Hellwig <hch@sgi.com> |
[XFS] write barrier support Issue all log sync operations as ordered writes. In addition flush the disk cache on fsync if the sync cached operation didn't sync the log to disk (this requires some additional bookeping in the transaction and log code). If the device doesn't claim to support barriers, the filesystem has an extern log volume or the trial superblock write with barriers enabled failed we disable barriers and print a warning. We should probably fail the mount completely, but that could lead to nasty boot failures for the root filesystem. Not enabled by default yet, needs more destructive testing first. SGI-PV: 912426 SGI-Modid: xfs-linux:xfs-kern:198723a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
da1650a5 |
|
01-Nov-2005 |
Christoph Hellwig <hch@sgi.com> |
[XFS] Add format checking to cmn_err and icmn_err SGI-PV: 942243 SGI-Modid: xfs-linux:xfs-kern:198658a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
7e9c6396 |
|
02-Sep-2005 |
Tim Shimmin <tes@sgi.com> |
[XFS] 929956 add log debugging and tracing info SGI-PV: 931456 SGI-Modid: xfs-linux:xfs-kern:23155a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
32fb9b57 |
|
02-Sep-2005 |
Tim Shimmin <tes@sgi.com> |
[XFS] Fix up the calculation of the reservation overhead to hopefully include all the components which make up the transaction in the ondisk log. Having this incomplete has shown up as problems on IRIX when some v2 log changes went in. The symptom was the msg of "xfs_log_write: reservation ran out. Need to up reservation" and was seen on synchronous writes on files with lots of holes (and therefore lots of extents). SGI-PV: 931457 SGI-Modid: xfs-linux:xfs-kern:23095a Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
ba0f32d4 |
|
20-Jun-2005 |
Christoph Hellwig <hch@sgi.com> |
[XFS] mark various symbols static Patch from Adrian Bunk SGI-PV: 936255 SGI-Modid: xfs-linux:xfs-kern:192760a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>
|
#
1da177e4 |
|
16-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|