#
b4e73e61 |
|
12-Dec-2023 |
Zhihao Cheng <chengzhihao1@huawei.com> |
jbd2: abort journal when detecting metadata writeback error of fs dev This is a replacement solution of commit bc71726c725767 ("ext4: abort the filesystem if failed to async write metadata buffer"), JBD2 can detect metadata writeback error of fs dev by 'j_fs_dev_wb_err'. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
147d4a09 |
|
07-Sep-2023 |
Ritesh Harjani (IBM) <ritesh.list@gmail.com> |
jbd2: Remove page size assumptions jbd2_alloc() allocates a buffer from slab when the block size is smaller than PAGE_SIZE, and slab may be using a compound page. Before commit 8147c4c4546f, we set b_page to the precise page containing the buffer and this code worked well. Now we set b_page to the head page of the allocation, so we can no longer use offset_in_page(). While we could do a 1:1 replacement with offset_in_folio(), use the more idiomatic bh_offset() and the folio APIs to map the buffer. This isn't enough to support a b_size larger than PAGE_SIZE on HIGHMEM machines, but this is good enough to fix the actual bug we're seeing. Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()") Reported-by: Zorro Lang <zlang@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> [converted to be more folio] Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
3c55097c |
|
06-Jun-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: remove __journal_try_to_free_buffer() __journal_try_to_free_buffer() has only one caller and it's logic is much simple now, so just remove it and open code in jbd2_journal_try_to_free_buffers(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
46f881b5 |
|
06-Jun-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: fix a race when checking checkpoint buffer busy Before removing checkpoint buffer from the t_checkpoint_list, we have to check both BH_Dirty and BH_Lock bits together to distinguish buffers have not been or were being written back. But __cp_buffer_busy() checks them separately, it first check lock state and then check dirty, the window between these two checks could be raced by writing back procedure, which locks buffer and clears buffer dirty before I/O completes. So it cannot guarantee checkpointing buffers been written back to disk if some error happens later. Finally, it may clean checkpoint transactions and lead to inconsistent filesystem. jbd2_journal_forget() and __journal_try_to_free_buffer() also have the same problem (journal_unmap_buffer() escape from this issue since it's running under the buffer lock), so fix them through introducing a new helper to try holding the buffer lock and remove really clean buffer. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490 Cc: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
bd159398 |
|
29-Mar-2023 |
Jan Kara <jack@suse.cz> |
jdb2: Don't refuse invalidation of already invalidated buffers When invalidating buffers under the partial tail page, jbd2_journal_invalidate_folio() returns -EBUSY if the buffer is part of the committing transaction as we cannot safely modify buffer state. However if the buffer is already invalidated (due to previous invalidation attempts from ext4_wait_for_tail_page_commit()), there's nothing to do and there's no point in returning -EBUSY. This fixes occasional warnings from ext4_journalled_invalidate_folio() triggered by generic/051 fstest when blocksize < pagesize. Fixes: 53e872681fed ("ext4: fix deadlock in journal_unmap_buffer()") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230329154950.19720-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
e6b9bd72 |
|
09-Jan-2023 |
Zhihao Cheng <chengzhihao1@huawei.com> |
jbd2: fix data missing when reusing bh which is ready to be checkpointed Following process will make data lost and could lead to a filesystem corrupted problem: 1. jh(bh) is inserted into T1->t_checkpoint_list, bh is dirty, and jh->b_transaction = NULL 2. T1 is added into journal->j_checkpoint_transactions. 3. Get bh prepare to write while doing checkpoing: PA PB do_get_write_access jbd2_log_do_checkpoint spin_lock(&jh->b_state_lock) if (buffer_dirty(bh)) clear_buffer_dirty(bh) // clear buffer dirty set_buffer_jbddirty(bh) transaction = journal->j_checkpoint_transactions jh = transaction->t_checkpoint_list if (!buffer_dirty(bh)) __jbd2_journal_remove_checkpoint(jh) // bh won't be flushed jbd2_cleanup_journal_tail __jbd2_journal_file_buffer(jh, transaction, BJ_Reserved) 4. Aborting journal/Power-cut before writing latest bh on journal area. In this way we get a corrupted filesystem with bh's data lost. Fix it by moving the clearing of buffer_dirty bit just before the call to __jbd2_journal_file_buffer(), both bit clearing and jh->b_transaction assignment are under journal->j_list_lock locked, so that jbd2_log_do_checkpoint() will wait until jh's new transaction fininshed even bh is currently not dirty. And journal_shrink_one_cp_list() won't remove jh from checkpoint list if the buffer head is reused in do_get_write_access(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216898 Cc: <stable@kernel.org> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: zhanchengbin <zhanchengbin1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230110015327.1181863-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
34fc8768 |
|
07-Sep-2022 |
Andrew Perepechko <anserper@ya.ru> |
jbd2: wake up journal waiters in FIFO order, not LIFO LIFO wakeup order is unfair and sometimes leads to a journal user not being able to get a journal handle for hundreds of transactions in a row. FIFO wakeup can make things more fair. Cc: stable@kernel.org Signed-off-by: Alexey Lyashkov <alexey.lyashkov@gmail.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20220907165959.1137482-1-alexey.lyashkov@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
4a734f08 |
|
15-Jul-2022 |
Zhihao Cheng <chengzhihao1@huawei.com> |
jbd2: fix assertion 'jh->b_frozen_data == NULL' failure when journal aborted Following process will fail assertion 'jh->b_frozen_data == NULL' in jbd2_journal_dirty_metadata(): jbd2_journal_commit_transaction unlink(dir/a) jh->b_transaction = trans1 jh->b_jlist = BJ_Metadata journal->j_running_transaction = NULL trans1->t_state = T_COMMIT unlink(dir/b) handle->h_trans = trans2 do_get_write_access jh->b_modified = 0 jh->b_frozen_data = frozen_buffer jh->b_next_transaction = trans2 jbd2_journal_dirty_metadata is_handle_aborted is_journal_aborted // return false --> jbd2 abort <-- while (commit_transaction->t_buffers) if (is_journal_aborted) jbd2_journal_refile_buffer __jbd2_journal_refile_buffer WRITE_ONCE(jh->b_transaction, jh->b_next_transaction) WRITE_ONCE(jh->b_next_transaction, NULL) __jbd2_journal_file_buffer(jh, BJ_Reserved) J_ASSERT_JH(jh, jh->b_frozen_data == NULL) // assertion failure ! The reproducer (See detail in [Link]) reports: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1629! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 2 PID: 584 Comm: unlink Tainted: G W 5.19.0-rc6-00115-g4a57a8400075-dirty #697 RIP: 0010:jbd2_journal_dirty_metadata+0x3c5/0x470 RSP: 0018:ffffc90000be7ce0 EFLAGS: 00010202 Call Trace: <TASK> __ext4_handle_dirty_metadata+0xa0/0x290 ext4_handle_dirty_dirblock+0x10c/0x1d0 ext4_delete_entry+0x104/0x200 __ext4_unlink+0x22b/0x360 ext4_unlink+0x275/0x390 vfs_unlink+0x20b/0x4c0 do_unlinkat+0x42f/0x4c0 __x64_sys_unlink+0x37/0x50 do_syscall_64+0x35/0x80 After journal aborting, __jbd2_journal_refile_buffer() is executed with holding @jh->b_state_lock, we can fix it by moving 'is_handle_aborted()' into the area protected by @jh->b_state_lock. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216251 Fixes: 470decc613ab20 ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20220715125152.4022726-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
cb3b3bf2 |
|
08-Jun-2022 |
Jan Kara <jack@suse.cz> |
jbd2: rename jbd_debug() to jbd2_debug() The name of jbd_debug() is confusing as all functions inside jbd2 have jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
4f5bf127 |
|
12-May-2022 |
Yang Li <yang.lee@linux.alibaba.com> |
fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment Add the description of @folio and remove @page in function kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/jbd2/transaction.c:2149: warning: Function parameter or member 'folio' not described in 'jbd2_journal_try_to_free_buffers' fs/jbd2/transaction.c:2149: warning: Excess function parameter 'page' description in 'jbd2_journal_try_to_free_buffers' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/20220512075432.31763-1-yang.lee@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
68189fef |
|
30-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Change try_to_free_buffers() to take a folio All but two of the callers already have a folio; pass a folio into try_to_free_buffers(). This removes the last user of cancel_dirty_page() so remove that wrapper function too. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
c56a6eb0 |
|
30-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio Also convert it to return a bool since it's called from release_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
ccd16945 |
|
09-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ext4: Convert invalidatepage to invalidate_folio Extensive changes, but fairly mechanical. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
|
#
2d442920 |
|
15-Feb-2022 |
Ritesh Harjani <riteshh@linux.ibm.com> |
jbd2: remove CONFIG_JBD2_DEBUG to update t_max_wait CONFIG_JBD2_DEBUG and jbd2_journal_enable_debug knobs were added in update_t_max_wait(), since earlier it used to take a spinlock for updating t_max_wait, which could cause a bottleneck while starting a txn (start_this_handle()). Since in previous patch, we have killed t_handle_lock completely, we could get rid of this debug config and knob to let t_max_wait be updated by default again. Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/ad7319a601fd501079310747ce87d908e0944763.1644992076.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
f7f497cb |
|
15-Feb-2022 |
Ritesh Harjani <riteshh@linux.ibm.com> |
jbd2: kill t_handle_lock transaction spinlock This patch kills t_handle_lock transaction spinlock completely from jbd2. To explain the reasoning, currently there were three sites at which this spinlock was used. 1. jbd2_journal_wait_updates() a. Based on careful code review it can be seen that, we don't need this lock here. This is since we wait for any currently ongoing updates based on a atomic variable t_updates. And we anyway don't take any t_handle_lock while in stop_this_handle(). i.e. write_lock(&journal->j_state_lock() jbd2_journal_wait_updates() stop_this_handle() while (atomic_read(txn->t_updates) { | DEFINE_WAIT(wait); | prepare_to_wait(); | if (atomic_read(txn->t_updates) if (atomic_dec_and_test(txn->t_updates)) write_unlock(&journal->j_state_lock); schedule(); wake_up() write_lock(&journal->j_state_lock); finish_wait(); } txn->t_state = T_COMMIT write_unlock(&journal->j_state_lock); b. Also note that between atomic_inc(&txn->t_updates) in start_this_handle() and jbd2_journal_wait_updates(), the synchronization happens via read_lock(journal->j_state_lock) in start_this_handle(); 2. jbd2_journal_extend() a. jbd2_journal_extend() is called with the handle of each process from task_struct. So no lock required in updating member fields of handle_t b. For member fields of h_transaction, all updates happens only via atomic APIs (which is also within read_lock()). So, no need of this transaction spinlock. 3. update_t_max_wait() Based on Jan suggestion, this can be carefully removed using atomic cmpxchg API. Note that there can be several processes which are waiting for a new transaction to be allocated and started. For doing this only one process will succeed in taking write_lock() and allocating a new txn. After that all of the process will be updating the t_max_wait (max transaction wait time). This can be done via below method w/o taking any locks using atomic cmpxchg. For more details refer [1] new = get_new_val(); old = READ_ONCE(ptr->max_val); while (old < new) old = cmpxchg(&ptr->max_val, old, new); [1]: https://lwn.net/Articles/849237/ Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d89e599658b4a1f3893a48c6feded200073037fc.1644992076.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
cc16eeca |
|
10-Feb-2022 |
Ritesh Harjani <riteshh@linux.ibm.com> |
jbd2: fix use-after-free of transaction_t race jbd2_journal_wait_updates() is called with j_state_lock held. But if there is a commit in progress, then this transaction might get committed and freed via jbd2_journal_commit_transaction() -> jbd2_journal_free_transaction(), when we release j_state_lock. So check for journal->j_running_transaction everytime we release and acquire j_state_lock to avoid use-after-free issue. Link: https://lore.kernel.org/r/948c2fed518ae739db6a8f7f83f1d58b504f87d0.1644497105.git.ritesh.list@gmail.com Fixes: 4f98186848707f53 ("jbd2: refactor wait logic for transaction updates into a common function") Cc: stable@kernel.org Reported-and-tested-by: syzbot+afa2ca5171d93e44b348@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
4f981868 |
|
17-Jan-2022 |
Ritesh Harjani <riteshh@linux.ibm.com> |
jbd2: refactor wait logic for transaction updates into a common function No functionality change as such in this patch. This only refactors the common piece of code which waits for t_updates to finish into a common function named as jbd2_journal_wait_updates(journal_t *) Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/8c564f70f4b2591171677a2a74fccb22a7b6c3a4.1642416995.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
188c299e |
|
16-Aug-2021 |
Jan Kara <jack@suse.cz> |
ext4: Support for checksumming from journal triggers JBD2 layer support triggers which are called when journaling layer moves buffer to a certain state. We can use the frozen trigger, which gets called when buffer data is frozen and about to be written out to the journal, to compute block checksums for some buffer types (similarly as does ocfs2). This avoids unnecessary repeated recomputation of the checksum (at the cost of larger window where memory corruption won't be caught by checksumming) and is even necessary when there are unsynchronized updaters of the checksummed data. So add superblock and journal trigger type arguments to ext4_journal_get_write_access() and ext4_journal_get_create_access() so that frozen triggers can be set accordingly. Also add inode argument to ext4_walk_page_buffers() and all the callbacks used with that function for the same purpose. This patch is mostly only a change of prototype of the above mentioned functions and a few small helpers. Real checksumming will come later. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210816095713.16537-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
b33d9f59 |
|
14-Aug-2021 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: add sparse annotations for add_transaction_credits() Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
235d6806 |
|
10-Jun-2021 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: don't abort the journal when freeing buffers Now that we can be sure the journal is aborted once a buffer has failed to be written back to disk, we can remove the journal abort logic in jbd2_journal_try_to_free_buffers() which was introduced in commit c044f3d8360d ("jbd2: abort journal if free a async write error metadata buffer"), because it may cost and propably is not safe. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
83fe6b18 |
|
06-Apr-2021 |
Jan Kara <jack@suse.cz> |
ext4: annotate data race in jbd2_journal_dirty_metadata() Assertion checks in jbd2_journal_dirty_metadata() are known to be racy but we don't want to be grabbing locks just for them. We thus recheck them under b_state_lock only if it looks like they would fail. Annotate the checks with data_race(). Cc: stable@kernel.org Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
3b1833e9 |
|
06-Apr-2021 |
Jan Kara <jack@suse.cz> |
ext4: annotate data race in start_this_handle() Access to journal->j_running_transaction is not protected by appropriate lock and thus is racy. We are well aware of that and the code handles the race properly. Just add a comment and data_race() annotation. Cc: stable@kernel.org Reported-by: syzbot+30774a6acf6a2cf6d535@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210406161804.20150-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
2bf31d94 |
|
16-Nov-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
jbd2: fix kernel-doc markups Kernel-doc markup should use this format: identifier - description They should not have any type before that, as otherwise the parser won't do the right thing. Also, some identifiers have different names between their prototypes and the kernel-doc markup. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
05d5233d |
|
06-Nov-2020 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: fix up sparse warnings in checkpoint code Add missing __acquires() and __releases() annotations. Also, in an "this should never happen" WARN_ON check, if it *does* actually happen, we need to release j_state_lock since this function is always supposed to release that lock. Otherwise, things will quickly grind to a halt after the WARN_ON trips. Fixes: 96f1e0974575 ("jbd2: avoid long hold times of j_state_lock...") Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
529a781e |
|
19-Jun-2020 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: remove unused parameter in jbd2_journal_try_to_free_buffers() Parameter gfp_mask in jbd2_journal_try_to_free_buffers() is no longer used after commit <536fc240e7147> ("jbd2: clean up jbd2_journal_try_to_free_buffers()"), so just remove it. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200620025427.1756360-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
c044f3d8 |
|
19-Jun-2020 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: abort journal if free a async write error metadata buffer If we free a metadata buffer which has been failed to async write out in the background, the jbd2 checkpoint procedure will not detect this failure in jbd2_log_do_checkpoint(), so it may lead to filesystem inconsistency after cleanup journal tail. This patch abort the journal if free a buffer has write_io_error flag to prevent potential further inconsistency. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
24dc9864 |
|
17-Jun-2020 |
Lukas Czerner <lczerner@redhat.com> |
jbd2: make sure jh have b_transaction set in refile/unfile_buffer Callers of __jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() assume that the b_transaction is set. In fact if it's not, we can end up with journal_head refcounting errors leading to crash much later that might be very hard to track down. Add asserts to make sure that is the case. We also make sure that b_next_transaction is NULL in __jbd2_journal_unfile_buffer() since the callers expect that as well and we should not get into that stage in this state anyway, leading to problems later on if we do. Tested with fstests. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200617092549.6712-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
14ff6286 |
|
20-May-2020 |
Jan Kara <jack@suse.cz> |
jbd2: avoid leaking transaction credits when unreserving handle When reserved transaction handle is unused, we subtract its reserved credits in __jbd2_journal_unreserve_handle() called from jbd2_journal_stop(). However this function forgets to remove reserved credits from transaction->t_outstanding_credits and thus the transaction space that was reserved remains effectively leaked. The leaked transaction space can be quite significant in some cases and leads to unnecessarily small transactions and thus reducing throughput of the journalling machinery. E.g. fsmark workload creating lots of 4k files was observed to have about 20% lower throughput due to this when ext4 is mounted with dioread_nolock mount option. Subtract reserved credits from t_outstanding_credits as well. CC: stable@vger.kernel.org Fixes: 8f7d89f36829 ("jbd2: transaction reservation support") Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200520133119.1383-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6c5d9112 |
|
21-Feb-2020 |
Qian Cai <cai@lca.pw> |
jbd2: fix data races at struct journal_head journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ|WRITE_ONCE(). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
8eedabfd |
|
20-Feb-2020 |
wangyan <wangyan122@huawei.com> |
jbd2: fix ocfs2 corrupt when clearing block group bits I found a NULL pointer dereference in ocfs2_block_group_clear_bits(). The running environment: kernel version: 4.19 A cluster with two nodes, 5 luns mounted on two nodes, and do some file operations like dd/fallocate/truncate/rm on every lun with storage network disconnection. The fallocate operation on dm-23-45 caused an null pointer dereference. The information of NULL pointer dereference as follows: [577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45. [577992.878290] Aborting journal on device dm-23-45. ... [577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46. [577992.890908] __journal_remove_journal_head: freeing b_committed_data [577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30 [577992.890918] __journal_remove_journal_head: freeing b_committed_data [577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30 [577992.890922] __journal_remove_journal_head: freeing b_committed_data [577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30 [577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30 [577992.890928] __journal_remove_journal_head: freeing b_committed_data [577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30 [577992.890933] __journal_remove_journal_head: freeing b_committed_data [577992.890939] __journal_remove_journal_head: freeing b_committed_data [577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [577992.890950] Mem abort info: [577992.890951] ESR = 0x96000004 [577992.890952] Exception class = DABT (current EL), IL = 32 bits [577992.890952] SET = 0, FnV = 0 [577992.890953] EA = 0, S1PTW = 0 [577992.890954] Data abort info: [577992.890955] ISV = 0, ISS = 0x00000004 [577992.890956] CM = 0, WnR = 0 [577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9 [577992.890960] [0000000000000020] pgd=0000000000000000 [577992.890964] Internal error: Oops: 96000004 [#1] SMP [577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd) [577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1 [577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019 [577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO) [577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2] [577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2] [577992.891084] sp : ffff0000c8e2b810 [577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000 [577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70 [577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2 [577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30 [577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000 [577992.891097] x19: ffff000001681638 x18: ffffffffffffffff [577992.891098] x17: 0000000000000000 x16: ffff000080a03df0 [577992.891100] x15: ffff0000811d9708 x14: 203d207375746174 [577992.891101] x13: 73203a524f525245 x12: 20373439343a6565 [577992.891103] x11: 0000000000000038 x10: 0101010101010101 [577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f [577992.891109] x7 : 0000000000000000 x6 : 0000000000000080 [577992.891110] x5 : 0000000000000000 x4 : 0000000000000002 [577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00 [577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000 [577992.891116] Call trace: [577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2] [577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2] [577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2] [577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2] [577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2] [577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2] [577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2] [577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2] [577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2] [577992.891316] vfs_fallocate+0x11c/0x250 [577992.891317] ksys_fallocate+0x54/0x88 [577992.891319] __arm64_sys_fallocate+0x28/0x38 [577992.891323] el0_svc_common+0x78/0x130 [577992.891325] el0_svc_handler+0x38/0x78 [577992.891327] el0_svc+0x8/0xc My analysis process as follows: ocfs2_fallocate __ocfs2_change_file_space ocfs2_allocate_extents ocfs2_extend_allocation ocfs2_add_inode_data ocfs2_add_clusters_in_btree ocfs2_insert_extent ocfs2_do_insert_extent ocfs2_rotate_tree_right ocfs2_extend_rotate_transaction ocfs2_extend_trans jbd2_journal_restart jbd2__journal_restart /* handle->h_transaction is NULL, * is_handle_aborted(handle) is true */ handle->h_transaction = NULL; start_this_handle return -EROFS; ocfs2_free_clusters _ocfs2_free_clusters _ocfs2_free_suballoc_bits ocfs2_block_group_clear_bits ocfs2_journal_access_gd __ocfs2_journal_access jbd2_journal_get_undo_access /* I think jbd2_write_access_granted() will * return true, because do_get_write_access() * will return -EROFS. */ if (jbd2_write_access_granted(...)) return 0; do_get_write_access /* handle->h_transaction is NULL, it will * return -EROFS here, so do_get_write_access() * was not called. */ if (is_handle_aborted(handle)) return -EROFS; /* bh2jh(group_bh) is NULL, caused NULL pointer dereference */ undo_bg = (struct ocfs2_group_desc *) bh2jh(group_bh)->b_committed_data; If handle->h_transaction == NULL, then jbd2_write_access_granted() does not really guarantee that journal_head will stay around, not even speaking of its b_committed_data. The bh2jh(group_bh) can be removed after ocfs2_journal_access_gd() and before call "bh2jh(group_bh)->b_committed_data". So, we should move is_handle_aborted() check from do_get_write_access() into jbd2_journal_get_undo_access() and jbd2_journal_get_write_access() before the call to jbd2_write_access_granted(). Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com Signed-off-by: Yan Wang <wangyan122@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
|
#
6a66a7de |
|
12-Feb-2020 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() There is no need to delay the clearing of b_modified flag to the transaction committing time when unmapping the journalled buffer, so just move it to the journal_unmap_buffer(). Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
|
#
8d6ce136 |
|
22-Jan-2020 |
Shijie Luo <luoshijie1@huawei.com> |
ext4,jbd2: fix comment and code style Fix comment and remove unneccessary blank. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200123064325.36358-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0c1cba6c |
|
22-Jan-2020 |
wangyan <wangyan122@huawei.com> |
jbd2: delete the duplicated words in the comments Delete the duplicated words "is" in the comments Signed-off-by: Yan Wang <wangyan122@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/12087f77-ab4d-c7ba-53b4-893dbf0026f0@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
19014d69 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Fine tune estimate of necessary descriptor blocks Currently we reserve j_max_transaction_buffers / 32 for transaction descriptor blocks. Now that revoke descriptors are accounted for separately this estimate is unnecessarily high and we can actually compute much tighter estimate. In the common case of 32k journal blocks and 4k blocksize this actually reduces the amount of reserved descriptor blocks from 256 to ~25 which allows us to fit more real data into a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0094f981 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Provide trace event for handle restarts Provide trace event for handle restarts to ease debugging. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d090707e |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Make credit checking more strict Make checking of available credits in jbd2_journal_dirty_metadata() more strict. There should be always enough credits in the handle to write all potential revoke descriptors. Also we warn in case there are not enough credits since this is a bug in the filesystem. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
933f1c1e |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Rename h_buffer_credits to h_total_credits The credit counter now contains both buffer and revoke descriptor block credits. Rename to counter to h_total_credits to reflect that. No functional change. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
fdc3ef88 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Reserve space for revoke descriptor blocks Extend functions for starting, extending, and restarting transaction handles to take number of revoke records handle must be able to accommodate. These functions then make sure transaction has enough credits to be able to store resulting revoke descriptor blocks. Also revoke code tracks number of revoke records created by a handle to catch situation where some place didn't reserve enough space for revoke records. Similarly to standard transaction credits, space for unused reserved revoke records is released when the handle is stopped. On the ext4 side we currently take a simplistic approach of reserving space for 1024 revoke records for any transaction. This grows amount of credits reserved for each handle only by a few and is enough for any normal workload so that we don't hit warnings in jbd2. We will refine the logic in following commits. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
77444ac4 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Drop jbd2_space_needed() The function is now just a trivial wrapper returning journal->j_max_transaction_buffers. Drop it. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
9f356e5a |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Account descriptor blocks into t_outstanding_credits Currently, journal descriptor blocks were not accounted in transaction->t_outstanding_credits and we were just leaving some slack space in the journal for them (in jbd2_log_space_left() and jbd2_space_needed()). This is making proper accounting (and reservation we want to add) of descriptor blocks difficult so switch to accounting descriptor blocks in transaction->t_outstanding_credits and just reserve the same amount of credits in t_outstanding credits for journal descriptor blocks when creating transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ec8b6f60 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Factor out common parts of stopping and restarting a handle jbd2__journal_restart() has quite some code that is common with jbd2_journal_stop(). Factor this functionality into stop_this_handle() helper and use it from both functions. Note that this also drops t_handle_lock protection from jbd2__journal_restart() as jbd2_journal_stop() does the same thing without it. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
5559b2d8 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Drop pointless wakeup from jbd2_journal_stop() When we drop last handle from a transaction and journal->j_barrier_count > 0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked wait queue. This looks pointless - wait for outstanding handles always happens on journal->j_wait_updates waitqueue. journal->j_wait_transaction_locked is used to wait for transaction state changes and by start_this_handle() for waiting until journal->j_barrier_count drops to 0. The first case is clearly irrelevant here since only jbd2 thread changes transaction state. The second case looks related but jbd2_journal_unlock_updates() is responsible for the wakeup in this case. So just drop the wakeup. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
150549ed |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Drop pointless check from jbd2_journal_stop() If a transaction is larger than journal->j_max_transaction_buffers, that is a bug and not a trigger for transaction commit. Also the very next attempt to start new handle will start transaction commit anyway. So just remove the pointless check. Arguably, we could start transaction commit whenever the transaction size is *close* to journal->j_max_transaction_buffers. This has a potential to reduce latency of the next jbd2_journal_start() at the cost of somewhat smaller transactions. However for this to have any effect, it would mean that there isn't someone already waiting in jbd2_journal_start() which means metadata load for the fs is pretty light anyway so probably this optimization is not worth it. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
dfaf5ffd |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Reorganize jbd2_journal_stop() Move code in jbd2_journal_stop() around a bit. It removes some unnecessary code duplication and will make factoring out parts common with jbd2__journal_restart() easier. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-14-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
46417064 |
|
09-Aug-2019 |
Thomas Gleixner <tglx@linutronix.de> |
jbd2: Make state lock a spinlock Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held region because bit spinlocks disable preemption even on RT. A first attempt was to replace state lock with a spinlock placed in struct buffer_head and make the locking conditional on PREEMPT_RT and DEBUG_BIT_SPINLOCKS. Jan pointed out that there is a 4 byte hole in struct journal_head where a regular spinlock fits in and he would not object to convert the state lock to a spinlock unconditionally. Aside of solving the RT problem, this also gains lockdep coverage for the journal head state lock (bit-spinlocks are not covered by lockdep as it's hard to fit a lockdep map into a single bit). The trivial change would have been to convert the jbd_*lock_bh_state() inlines, but that comes with the downside that these functions take a buffer head pointer which needs to be converted to a journal head pointer which adds another level of indirection. As almost all functions which use this lock have a journal head pointer readily available, it makes more sense to remove the lock helper inlines and write out spin_*lock() at all call sites. Fixup all locking comments as well. Suggested-by: Jan Kara <jack@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Link: https://lore.kernel.org/r/20190809124233.13277-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
2e710ff0 |
|
09-Aug-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Don't call __bforget() unnecessarily jbd2_journal_forget() jumps to 'not_jbd' branch which calls __bforget() in cases where the buffer is clean which is pointless. In case of failed assertion, it can be even argued that it is safer not to touch buffer's dirty bits. Also logically it makes more sense to just jump to 'drop' and that will make logic also simpler when we switch bh_state_lock to a spinlock. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-6-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6d69843e |
|
09-Aug-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Drop unnecessary branch from jbd2_journal_forget() We have cleared both dirty & jbddirty bits from the bh. So there's no difference between bforget() and brelse(). Thus there's no point jumping to no_jbd branch. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
93108ebb |
|
09-Aug-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Move dropping of jh reference out of un/re-filing functions __jbd2_journal_unfile_buffer() and __jbd2_journal_refile_buffer() drop transaction's jh reference when they remove jh from a transaction. This will be however inconvenient once we move state lock into journal_head itself as we still need to unlock it and we'd need to grab jh reference just for that. Move dropping of jh reference out of these functions into the few callers. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d84560f7 |
|
09-Aug-2019 |
Thomas Gleixner <tglx@linutronix.de> |
jbd2: Simplify journal_unmap_buffer() journal_unmap_buffer() checks first whether the buffer head is a journal. If so it takes locks and then invokes jbd2_journal_grab_journal_head() followed by another check whether this is journal head buffer. The double checking is pointless. Replace the initial check with jbd2_journal_grab_journal_head() which alredy checks whether the buffer head is actually a journal. Allows also early access to the journal head pointer for the upcoming conversion of state lock to a regular spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: linux-ext4@vger.kernel.org Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
5facae4f |
|
18-Sep-2019 |
Qian Cai <cai@lca.pw> |
locking/lockdep: Remove unused @nested argument from lock_release() Since the following commit: b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release") @nested is no longer used in lock_release(), so remove it from all lock_release() calls and friends. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: alexander.levin@microsoft.com Cc: daniel@iogearbox.net Cc: davem@davemloft.net Cc: dri-devel@lists.freedesktop.org Cc: duyuyang@gmail.com Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: intel-gfx@lists.freedesktop.org Cc: jack@suse.com Cc: jlbec@evilplan.or Cc: joonas.lahtinen@linux.intel.com Cc: joseph.qi@linux.alibaba.com Cc: jslaby@suse.com Cc: juri.lelli@redhat.com Cc: maarten.lankhorst@linux.intel.com Cc: mark@fasheh.com Cc: mhocko@kernel.org Cc: mripard@kernel.org Cc: ocfs2-devel@oss.oracle.com Cc: rodrigo.vivi@intel.com Cc: sean@poorly.run Cc: st@kernel.org Cc: tj@kernel.org Cc: tytso@mit.edu Cc: vdavydov.dev@gmail.com Cc: vincent.guittot@linaro.org Cc: viro@zeniv.linux.org.uk Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
963abb9a |
|
23-Sep-2019 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
jbd2: remove jbd2_journal_inode_add_[write|wait] Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now, jbd2_journal_inode_add_[write|wait] are not used any more, remove them. Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Acked-by: Changwei Ge <chge@linux.alibaba.com> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4c273352 |
|
24-Aug-2019 |
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> |
jbd2: add missing tracepoint for reserved handle This issue was found when I use ebpf to trace every jbd2 handle's running info in dioread_nolock case. Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6ba0e7dc |
|
20-Jun-2019 |
Ross Zwisler <zwisler@chromium.org> |
jbd2: introduce jbd2_inode dirty range scoping Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
|
#
0d52154b |
|
10-May-2019 |
Chengguang Xu <cgxu519@gmail.com> |
jbd2: fix potential double free When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function. Signed-off-by: Chengguang Xu <cgxu519@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
|
#
0df6f469 |
|
28-Feb-2019 |
Liu Song <fishland@aliyun.com> |
jbd2: jbd2_get_transaction does not need to return a value In jbd2_get_transaction, a new transaction is initialized, and set to the j_running_transaction. No need for a return value, so remove it. Also, adjust some comments to match the actual operation of this function. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
01215d3e |
|
21-Feb-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: fix compile warning when using JBUFFER_TRACE The jh pointer may be used uninitialized in the two cases below and the compiler complain about it when enabling JBUFFER_TRACE macro, fix them. In file included from fs/jbd2/transaction.c:19:0: fs/jbd2/transaction.c: In function ‘jbd2_journal_get_undo_access’: ./include/linux/jbd2.h:1637:38: warning: ‘jh’ is used uninitialized in this function [-Wuninitialized] #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0) ^ fs/jbd2/transaction.c:1219:23: note: ‘jh’ was declared here struct journal_head *jh; ^ In file included from fs/jbd2/transaction.c:19:0: fs/jbd2/transaction.c: In function ‘jbd2_journal_dirty_metadata’: ./include/linux/jbd2.h:1637:38: warning: ‘jh’ may be used uninitialized in this function [-Wmaybe-uninitialized] #define JBUFFER_TRACE(jh, info) do { printk("%s: %d\n", __func__, jh->b_jcount);} while (0) ^ fs/jbd2/transaction.c:1332:23: note: ‘jh’ was declared here struct journal_head *jh; ^ Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Reviewed-by: Jan Kara <jack@suse.cz>
|
#
59759926 |
|
10-Feb-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: discard dirty data when forgetting an un-journalled buffer We do not unmap and clear dirty flag when forgetting a buffer without journal or does not belongs to any transaction, so the invalid dirty data may still be written to the disk later. It's fine if the corresponding block is never used before the next mount, and it's also fine that we invoke clean_bdev_aliases() related functions to unmap the block device mapping when re-allocating such freed block as data block. But this logic is somewhat fragile and risky that may lead to data corruption if we forget to clean bdev aliases. So, It's better to discard dirty data during forget time. We have been already handled all the cases of forgetting journalled buffer, this patch deal with the remaining two cases. - buffer is not journalled yet, - buffer is journalled but doesn't belongs to any transaction. We invoke __bforget() instead of __brelese() when forgetting an un-journalled buffer in jbd2_journal_forget(). After this patch we can remove all clean_bdev_aliases() related calls in ext4. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
904cdbd4 |
|
10-Feb-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: clear dirty flag when revoking a buffer from an older transaction Now, we capture a data corruption problem on ext4 while we're truncating an extent index block. Imaging that if we are revoking a buffer which has been journaled by the committing transaction, the buffer's jbddirty flag will not be cleared in jbd2_journal_forget(), so the commit code will set the buffer dirty flag again after refile the buffer. fsx kjournald2 jbd2_journal_commit_transaction jbd2_journal_revoke commit phase 1~5... jbd2_journal_forget belongs to older transaction commit phase 6 jbddirty not clear __jbd2_journal_refile_buffer __jbd2_journal_unfile_buffer test_clear_buffer_jbddirty mark_buffer_dirty Finally, if the freed extent index block was allocated again as data block by some other files, it may corrupt the file data after writing cached pages later, such as during unmount time. (In general, clean_bdev_aliases() related helpers should be invoked after re-allocation to prevent the above corruption, but unfortunately we missed it when zeroout the head of extra extent blocks in ext4_ext_handle_unwritten_extents()). This patch mark buffer as freed and set j_next_transaction to the new transaction when it already belongs to the committing transaction in jbd2_journal_forget(), so that commit code knows it should clear dirty bits when it is done with the buffer. This problem can be reproduced by xfstests generic/455 easily with seeds (3246 3247 3248 3249). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
|
#
561405f0 |
|
03-Dec-2018 |
Colin Ian King <colin.king@canonical.com> |
jbd2: clean up indentation issue, replace spaces with tab There is a statement that is indented with spaces, replace it with a tab. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
96f1e097 |
|
03-Dec-2018 |
Jan Kara <jack@suse.cz> |
jbd2: avoid long hold times of j_state_lock while committing a transaction We can hold j_state_lock for writing at the beginning of jbd2_journal_commit_transaction() for a rather long time (reportedly for 30 ms) due cleaning revoke bits of all revoked buffers under it. The handling of revoke tables as well as cleaning of t_reserved_list, and checkpoint lists does not need j_state_lock for anything. It is only needed to prevent new handles from joining the transaction. Generally T_LOCKED transaction state prevents new handles from joining the transaction - except for reserved handles which have to allowed to join while we wait for other handles to complete. To prevent reserved handles from joining the transaction while cleaning up lists, add new transaction state T_SWITCH and watch for it when starting reserved handles. With this we can just drop the lock for operations that don't need it. Reported-and-tested-by: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: "Theodore Y. Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
e09463f2 |
|
16-Jun-2018 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: don't mark block as modified if the handle is out of credits Do not set the b_modified flag in block's journal head should not until after we're sure that jbd2_journal_dirty_metadat() will not abort with an error due to there not being enough space reserved in the jbd2 handle. Otherwise, future attempts to modify the buffer may lead a large number of spurious errors and warnings. This addresses CVE-2018-10883. https://bugzilla.kernel.org/show_bug.cgi?id=200071 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
|
#
8bdd5b60 |
|
20-May-2018 |
Wang Long <wanglong19@meituan.com> |
jbd2: remove NULL check before calling kmem_cache_destroy() The kmem_cache_destroy() function already checks for null pointers, so we can remove the check at the call site. This patch also sets jbd2_handle_cache and jbd2_inode_cache to be NULL after freeing them in jbd2_journal_destroy_handle_cache(). Signed-off-by: Wang Long <wanglong19@meituan.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
b2569260 |
|
18-Apr-2018 |
Theodore Ts'o <tytso@mit.edu> |
ext4: set h_journal if there is a failure starting a reserved handle If ext4 tries to start a reserved handle via jbd2_journal_start_reserved(), and the journal has been aborted, this can result in a NULL pointer dereference. This is because the fields h_journal and h_transaction in the handle structure share the same memory, via a union, so jbd2_journal_start_reserved() will clear h_journal before calling start_this_handle(). If this function fails due to an aborted handle, h_journal will still be NULL, and the call to jbd2_journal_free_reserved() will pass a NULL journal to sub_reserve_credits(). This can be reproduced by running "kvm-xfstests -c dioread_nolock generic/475". Cc: stable@kernel.org # 3.11 Fixes: 8f7d89f36829b ("jbd2: transaction reservation support") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
f69120ce |
|
09-Jan-2018 |
Tobin C. Harding <me@tobin.cc> |
jbd2: fix sphinx kernel-doc build warnings Sphinx emits various (26) warnings when building make target 'htmldocs'. Currently struct definitions contain duplicate documentation, some as kernel-docs and some as standard c89 comments. We can reduce duplication while cleaning up the kernel docs. Move all kernel-docs to right above each struct member. Use the set of all existing comments (kernel-doc and c89). Add documentation for missing struct members and function arguments. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
f5166768 |
|
17-Dec-2017 |
Theodore Ts'o <tytso@mit.edu> |
ext4: fix up remaining files with SPDX cleanups A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
b4709067 |
|
21-May-2017 |
Tahsin Erdogan <tahsin@google.com> |
jbd2: preserve original nofs flag during journal restart When a transaction starts, start_this_handle() saves current PF_MEMALLOC_NOFS value so that it can be restored at journal stop time. Journal restart is a special case that calls start_this_handle() without stopping the transaction. start_this_handle() isn't aware that the original value is already stored so it overwrites it with current value. For instance, a call sequence like below leaves PF_MEMALLOC_NOFS flag set at the end: jbd2_journal_start() jbd2__journal_restart() jbd2_journal_stop() Make jbd2__journal_restart() restore the original value before calling start_this_handle(). Fixes: 81378da64de6 ("jbd2: mark the transaction context with the scope GFP_NOFS context") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
df1b560a |
|
12-May-2017 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
fs: jbd2: escape a string with special chars on a kernel-doc kernel-doc will try to interpret a foo() string, except if properly escaped. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
91e4775d |
|
12-May-2017 |
Mauro Carvalho Chehab <mchehab@kernel.org> |
fs: jbd2: make jbd2_journal_start() kernel-doc parseable kernel-doc script expects that a function documentation to be just before the function, otherwise it will be ignored. So, move the kernel-doc markup to the right place. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
|
#
81378da6 |
|
03-May-2017 |
Michal Hocko <mhocko@suse.com> |
jbd2: mark the transaction context with the scope GFP_NOFS context now that we have memalloc_nofs_{save,restore} api we can mark the whole transaction context as implicitly GFP_NOFS. All allocations will automatically inherit GFP_NOFS this way. This means that we do not have to mark any of those requests with GFP_NOFS and moreover all the ext4_kv[mz]alloc(GFP_NOFS) are also safe now because even the hardcoded GFP_KERNEL allocations deep inside the vmalloc will be NOFS now. [akpm@linux-foundation.org: tweak comments] Link: http://lkml.kernel.org/r/20170306131408.9828-7-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e112666b |
|
04-Feb-2017 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: don't leak modified metadata buffers on an aborted journal If the journal has been aborted, we shouldn't mark the underlying buffer head as dirty, since that will cause the metadata block to get modified. And if the journal has been aborted, we shouldn't allow this since it will almost certainly lead to a corrupted file system. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
559cce69 |
|
12-Oct-2016 |
Taesoo Kim <tsgatesv@gmail.com> |
jbd2: fix incorrect unlock on j_list_lock When 'jh->b_transaction == transaction' (asserted by below) J_ASSERT_JH(jh, (jh->b_transaction == transaction || ... 'journal->j_list_lock' will be incorrectly unlocked, since the the lock is aquired only at the end of if / else-if statements (missing the else case). Signed-off-by: Taesoo Kim <tsgatesv@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Fixes: 6e4862a5bb9d12be87e4ea5d9a60836ebed71d28 Cc: stable@vger.kernel.org # 3.14+
|
#
e03a9976 |
|
22-Sep-2016 |
Jan Kara <jack@suse.cz> |
jbd2: fix lockdep annotation in add_transaction_credits() Thomas has reported a lockdep splat hitting in add_transaction_credits(). The problem is that that function calls jbd2_might_wait_for_commit() while holding j_state_lock which is wrong (we do not really wait for transaction commit while holding that lock). Fix the problem by moving jbd2_might_wait_for_commit() into places where we are ready to wait for transaction commit and thus j_state_lock is unlocked. Cc: stable@vger.kernel.org Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
1eaa566d |
|
30-Jun-2016 |
Jan Kara <jack@suse.cz> |
jbd2: track more dependencies on transaction commit So far we were tracking only dependency on transaction commit due to starting a new handle (which may require commit to start a new transaction). Now add tracking also for other cases where we wait for transaction commit. This way lockdep can catch deadlocks e. g. because we call jbd2_journal_stop() for a synchronous handle with some locks held which rank below transaction start. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ab714aff |
|
30-Jun-2016 |
Jan Kara <jack@suse.cz> |
jbd2: move lockdep tracking to journal_s Currently lockdep map is tracked in each journal handle. To be able to expand lockdep support to cover also other cases where we depend on transaction commit and where handle is not available, move lockdep map into struct journal_s. Since this makes the lockdep map shared for all handles, we have to use rwsem_acquire_read() for acquisitions now. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
7a4b188f |
|
30-Jun-2016 |
Jan Kara <jack@suse.cz> |
jbd2: move lockdep instrumentation for jbd2 handles The transaction the handle references is free to commit once we've decremented t_updates counter. Move the lockdep instrumentation to that place. Currently it was a bit later which did not really matter but subsequent improvements to lockdep instrumentation would cause false positives with it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
41617e1a |
|
23-Apr-2016 |
Jan Kara <jack@suse.cz> |
jbd2: add support for avoiding data writes during transaction commits Currently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus asking jbd2 to write dirty pages just unnecessarily adds more work to jbd2 possibly writing back other redirtied blocks. Add support to jbd2 to allow filesystem to ask jbd2 to only wait for outstanding data writes before committing a transaction and thus avoid unnecessary writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
bd7ced98 |
|
02-Feb-2016 |
Masanari Iida <standby24x7@gmail.com> |
Doc: treewide : Fix typos in DocBook/filesystem.xml This patch fix spelling typos found in DocBook/filesystem.xml. It is because the file was generated from comments in code, I have to fix the comments in codes, instead of xml file. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
09cbfeaf |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
490c1b44 |
|
13-Mar-2016 |
Michal Hocko <mhocko@suse.com> |
jbd2: do not fail journal because of frozen_buffer allocation failure Journal transaction might fail prematurely because the frozen_buffer is allocated by GFP_NOFS request: [ 72.440013] do_get_write_access: OOM for frozen_buffer [ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory (...snipped....) [ 72.495559] do_get_write_access: OOM for frozen_buffer [ 72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.496839] do_get_write_access: OOM for frozen_buffer [ 72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.505766] Aborting journal on device sda1-8. [ 72.505851] EXT4-fs (sda1): Remounting filesystem read-only This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" because small GPF_NOFS allocations never failed. This allocation seems essential for the journal and GFP_NOFS is too restrictive to the memory allocator so let's use __GFP_NOFAIL here to emulate the previous behavior. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
a1c6f057 |
|
13-Apr-2015 |
Dmitry Monakhov <dmonakhov@openvz.org> |
fs: use block_device name vsprintf helper Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
087ffd4e |
|
03-Dec-2015 |
Junxiao Bi <junxiao.bi@oracle.com> |
jbd2: fix null committed data return in undo_access introduced jbd2_write_access_granted() to improve write|undo_access speed, but missed to check the status of b_committed_data which caused a kernel panic on ocfs2. [ 6538.405938] ------------[ cut here ]------------ [ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400! [ 6538.406686] invalid opcode: 0000 [#1] SMP [ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod [ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1 [ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014 [ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000 [ 6538.406686] RIP: 0010:[<ffffffffa06a286b>] [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2] [ 6538.406686] RSP: 0018:ffff880075b7b7f8 EFLAGS: 00010246 [ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0 [ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430 [ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001 [ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001 [ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0 [ 6538.406686] FS: 00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000 [ 6538.406686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0 [ 6538.406686] Stack: [ 6538.406686] ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800 [ 6538.406686] ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d [ 6538.406686] ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000 [ 6538.406686] Call Trace: [ 6538.406686] [<ffffffffa06a309d>] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2] [ 6538.406686] [<ffffffffa06a3654>] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2] [ 6538.406686] [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2] [ 6538.406686] [<ffffffffa06a397e>] _ocfs2_free_clusters+0xee/0x210 [ocfs2] [ 6538.406686] [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2] [ 6538.406686] [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2] [ 6538.406686] [<ffffffffa0682d50>] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2] [ 6538.406686] [<ffffffffa06a3ad5>] ocfs2_free_clusters+0x15/0x20 [ocfs2] [ 6538.406686] [<ffffffffa065072c>] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2] [ 6538.406686] [<ffffffffa06843ac>] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2] [ 6538.406686] [<ffffffffa0654600>] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2] [ 6538.406686] [<ffffffffa0654394>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2] [ 6538.406686] [<ffffffffa065acd4>] ocfs2_remove_btree_range+0x374/0x630 [ocfs2] [ 6538.406686] [<ffffffffa017486b>] ? jbd2_journal_stop+0x25b/0x470 [jbd2] [ 6538.406686] [<ffffffffa065d5b5>] ocfs2_commit_truncate+0x305/0x670 [ocfs2] [ 6538.406686] [<ffffffffa0683430>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2] [ 6538.406686] [<ffffffffa067adb7>] ocfs2_truncate_file+0x297/0x380 [ocfs2] [ 6538.406686] [<ffffffffa01759e4>] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2] [ 6538.406686] [<ffffffffa067c7a2>] ocfs2_setattr+0x572/0x860 [ocfs2] [ 6538.406686] [<ffffffff810e4a3f>] ? current_fs_time+0x3f/0x50 [ 6538.406686] [<ffffffff812124b7>] notify_change+0x1d7/0x340 [ 6538.406686] [<ffffffff8121abf9>] ? generic_getxattr+0x79/0x80 [ 6538.406686] [<ffffffff811f5876>] do_truncate+0x66/0x90 [ 6538.406686] [<ffffffff81120e30>] ? __audit_syscall_entry+0xb0/0x110 [ 6538.406686] [<ffffffff811f5bb3>] do_sys_ftruncate.clone.0+0xf3/0x120 [ 6538.406686] [<ffffffff811f5bee>] SyS_ftruncate+0xe/0x10 [ 6538.406686] [<ffffffff816aa2ae>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 [ 6538.406686] RIP [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2] [ 6538.406686] RSP <ffff880075b7b7f8> [ 6538.691128] ---[ end trace 31cd7011d6770d7e ]--- [ 6538.694492] Kernel panic - not syncing: Fatal exception [ 6538.695484] Kernel Offset: disabled Fixes: de92c8caf16c("jbd2: speedup jbd2_journal_get_[write|undo]_access()") Cc: <stable@vger.kernel.org> Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
bc23f0c8 |
|
24-Nov-2015 |
Jan Kara <jack@suse.cz> |
jbd2: Fix unreclaimed pages after truncate in data=journal mode Ted and Namjae have reported that truncated pages don't get timely reclaimed after being truncated in data=journal mode. The following test triggers the issue easily: for (i = 0; i < 1000; i++) { pwrite(fd, buf, 1024*1024, 0); fsync(fd); fsync(fd); ftruncate(fd, 0); } The reason is that journal_unmap_buffer() finds that truncated buffers are not journalled (jh->b_transaction == NULL), they are part of checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have been already written out (!buffer_dirty(bh)). We clean such buffers but we leave them in the checkpoint list. Since checkpoint transaction holds a reference to the journal head, these buffers cannot be released until the checkpoint transaction is cleaned up. And at that point we don't call release_buffer_page() anymore so pages detached from mapping are lingering in the system waiting for reclaim to find them and free them. Fix the problem by removing buffers from transaction checkpoint lists when journal_unmap_buffer() finds out they don't have to be there anymore. Reported-and-tested-by: Namjae Jeon <namjae.jeon@samsung.com> Fixes: de1b794130b130e77ffa975bb58cb843744f9ae5 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
d0164adc |
|
06-Nov-2015 |
Mel Gorman <mgorman@techsingularity.net> |
mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6d3ec14d |
|
04-Aug-2015 |
Lukas Czerner <lczerner@redhat.com> |
jbd2: limit number of reserved credits Currently there is no limitation on number of reserved credits we can ask for. If we ask for more reserved credits than 1/2 of maximum transaction size, or if total number of credits exceeds the maximum transaction size per operation (which is currently only possible with the former) we will spin forever in start_this_handle(). Fix this by adding this limitation at the start of start_this_handle(). This patch also removes the credit limitation 1/2 of maximum transaction size, since we really only want to limit the number of reserved credits. There is not much point to limit the credits if there is still space in the journal. This accidentally also fixes the online resize, where due to the limitation of the journal credits we're unable to grow file systems with 1k block size and size between 16M and 32M. It has been partially fixed by 2c869b262a10ca99cb866d04087d75311587a30c, but not entirely. Thanks Jan Kara for helping me getting the correct fix. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6e06ae88 |
|
12-Jul-2015 |
Jan Kara <jack@suse.cz> |
jbd2: speedup jbd2_journal_dirty_metadata() It is often the case that we mark buffer as having dirty metadata when the buffer is already in that state (frequent for bitmaps, inode table blocks, superblock). Thus it is unnecessary to contend on grabbing journal head reference and bh_state lock. Avoid that by checking whether any modification to the buffer is needed before grabbing any locks or references. [ Note: this is a fixed version of commit 2143c1965a761, which was reverted in ebeaa8ddb3663b5 due to a false positive triggering of an assertion check. -- Ted ] Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ebeaa8dd |
|
27-Jun-2015 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "jbd2: speedup jbd2_journal_dirty_metadata()" This reverts commit 2143c1965a761332ae417b22fd477b636e4f54ec. This commit seems to be the cause of the following jbd2 assertion failure: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1325! invalid opcode: 0000 [#1] SMP Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ... CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679b411 #1 Hardware name: /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014 task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000 RIP: jbd2_journal_dirty_metadata+0x237/0x290 Call Trace: __ext4_handle_dirty_metadata+0x43/0x1f0 ext4_handle_dirty_dirent_node+0xde/0x160 ? jbd2_journal_get_write_access+0x36/0x50 ext4_delete_entry+0x112/0x160 ? __ext4_journal_start_sb+0x52/0xb0 ext4_unlink+0xfa/0x260 vfs_unlink+0xec/0x190 do_unlinkat+0x24a/0x270 SyS_unlink+0x11/0x20 entry_SYSCALL_64_fastpath+0x12/0x6a ---[ end trace ae033ebde8d080b4 ]--- which is not easily reproducible (I've seen it just once, and then Ted was able to reproduce it once). Revert it while Ted and Jan try to figure out what is wrong. Cc: Jan Kara <jack@suse.cz> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2143c196 |
|
20-Jun-2015 |
Jan Kara <jack@suse.cz> |
jbd2: speedup jbd2_journal_dirty_metadata() It is often the case that we mark buffer as having dirty metadata when the buffer is already in that state (frequent for bitmaps, inode table blocks, superblock). Thus it is unnecessary to contend on grabbing journal head reference and bh_state lock. Avoid that by checking whether any modification to the buffer is needed before grabbing any locks or references. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
de92c8ca |
|
07-Jun-2015 |
Jan Kara <jack@suse.cz> |
jbd2: speedup jbd2_journal_get_[write|undo]_access() jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are frequently called for buffers that are already part of the running transaction - most frequently it is the case for bitmaps, inode table blocks, and superblock. Since in such cases we have nothing to do, it is unfortunate we still grab reference to journal head, lock the bh, lock bh_state only to find out there's nothing to do. Improving this is a bit subtle though since until we find out journal head is attached to the running transaction, it can disappear from under us because checkpointing / commit decided it's no longer needed. We deal with this by protecting journal_head slab with RCU. We still have to be careful about journal head being freed & reallocated within slab and about exposing journal head in consistent state (in particular b_modified and b_frozen_data must be in correct state before we allow user to touch the buffer). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
8b00f400 |
|
07-Jun-2015 |
Jan Kara <jack@suse.cz> |
jbd2: more simplifications in do_get_write_access() Check for the simple case of unjournaled buffer first, handle it and bail out. This allows us to remove one if and unindent the difficult case by one tab. The result is easier to read. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d012aa59 |
|
07-Jun-2015 |
Jan Kara <jack@suse.cz> |
jbd2: simplify error path on allocation failure in do_get_write_access() We were acquiring bh_state_lock when allocation of buffer failed in do_get_write_access() only to be able to jump to a label that releases the lock and does all other checks that don't make sense for this error path. Just jump into the right label instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ee57aba1 |
|
07-Jun-2015 |
Jan Kara <jack@suse.cz> |
jbd2: simplify code flow in do_get_write_access() needs_copy is set only in one place in do_get_write_access(), just move the frozen buffer copying into that place and factor it out to a separate function to make do_get_write_access() slightly more readable. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6ccaf3e2 |
|
08-Jun-2015 |
Michal Hocko <mhocko@suse.cz> |
jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL This basically reverts 47def82672b3 (jbd2: Remove __GFP_NOFAIL from jbd2 layer). The deprecation of __GFP_NOFAIL was a bad choice because it led to open coding the endless loop around the allocator rather than removing the dependency on the non failing allocation. So the deprecation was a clear failure and the reality tells us that __GFP_NOFAIL is not even close to go away. It is still true that __GFP_NOFAIL allocations are generally discouraged and new uses should be evaluated and an alternative (pre-allocations or reservations) should be considered but it doesn't make any sense to lie the allocator about the requirements. Allocator can take steps to help making a progress if it knows the requirements. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Acked-by: David Rientjes <rientjes@google.com>
|
#
9d506594 |
|
14-May-2015 |
Lukas Czerner <lczerner@redhat.com> |
ext4: fix NULL pointer dereference when journal restart fails Currently when journal restart fails, we'll have the h_transaction of the handle set to NULL to indicate that the handle has been effectively aborted. We handle this situation quietly in the jbd2_journal_stop() and just free the handle and exit because everything else has been done before we attempted (and failed) to restart the journal. Unfortunately there are a number of problems with that approach introduced with commit 41a5b913197c "jbd2: invalidate handle if jbd2_journal_restart() fails" First of all in ext4 jbd2_journal_stop() will be called through __ext4_journal_stop() where we would try to get a hold of the superblock by dereferencing h_transaction which in this case would lead to NULL pointer dereference and crash. In addition we're going to free the handle regardless of the refcount which is bad as well, because others up the call chain will still reference the handle so we might potentially reference already freed memory. Moreover it's expected that we'll get aborted handle as well as detached handle in some of the journalling function as the error propagates up the stack, so it's unnecessary to call WARN_ON every time we get detached handle. And finally we might leak some memory by forgetting to free reserved handle in jbd2_journal_stop() in the case where handle was detached from the transaction (h_transaction is NULL). Fix the NULL pointer dereference in __ext4_journal_stop() by just calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix the potential memory leak in jbd2_journal_stop() and use proper handle refcounting before we attempt to free it to avoid use-after-free issues. And finally remove all WARN_ON(!transaction) from the code so that we do not get random traces when something goes wrong because when journal restart fails we will get to some of those functions. Cc: stable@vger.kernel.org Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
74316201 |
|
06-Jul-2014 |
NeilBrown <neilb@suse.de> |
sched: Remove proliferation of wait_on_bit() action functions The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
5dd21424 |
|
05-Jul-2014 |
Eric Sandeen <sandeen@redhat.com> |
ext4: disable synchronous transaction batching if max_batch_time==0 The mount manpage says of the max_batch_time option, This optimization can be turned off entirely by setting max_batch_time to 0. But the code doesn't do that. So fix the code to do that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
66a4cb18 |
|
12-Mar-2014 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: improve error messages for inconsistent journal heads Fix up error messages printed when the transaction pointers in a journal head are inconsistent. This improves the error messages which are printed when running xfstests generic/068 in data=journal mode. See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
0bfea811 |
|
08-Mar-2014 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
6e4862a5 |
|
08-Mar-2014 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: minimize region locked by j_list_lock in journal_get_create_access() It's not needed until we start trying to modifying fields in the journal_head which are protected by j_list_lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
d2eb0b99 |
|
08-Mar-2014 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: check jh->b_transaction without taking j_list_lock jh->b_transaction is adequately protected for reading by the jbd_lock_bh_state(bh), so we don't need to take j_list_lock in __journal_try_to_free_buffer(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
92e3b405 |
|
17-Feb-2014 |
Dan Carpenter <dan.carpenter@oracle.com> |
jbd2: fix use after free in jbd2_journal_start_reserved() If start_this_handle() fails then it leads to a use after free of "handle". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
a67c848a |
|
08-Dec-2013 |
Dmitry Monakhov <dmonakhov@openvz.org> |
jbd2: rename obsoleted msg JBD->JBD2 Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
75685071 |
|
08-Dec-2013 |
Jan Kara <jack@suse.cz> |
jbd2: revise KERN_EMERG error messages Some of KERN_EMERG printk messages do not really deserve this log level and the one in log_wait_commit() is even rather useless (the journal has been previously aborted and *that* is where we should have been complaining). So make some messages just KERN_ERR and remove the useless message. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f6c07cad |
|
08-Dec-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: don't BUG but return ENOSPC if a handle runs out of space If a handle runs out of space, we currently stop the kernel with a BUG in jbd2_journal_dirty_metadata(). This makes it hard to figure out what might be going on. So return an error of ENOSPC, so we can let the file system layer figure out what is going on, to make it more likely we can get useful debugging information). This should make it easier to debug problems such as the one which was reported by: https://bugzilla.kernel.org/show_bug.cgi?id=44731 The only two callers of this function are ext4_handle_dirty_metadata() and ocfs2_journal_dirty(). The ocfs2 function will trigger a BUG_ON(), which means there will be no change in behavior. The ext4 function will call ext4_error_inode() which will print the useful debugging information and then handle the situation using ext4's error handling mechanisms (i.e., which might mean halting the kernel or remounting the file system read-only). Also, since both file systems already call WARN_ON(), drop the WARN_ON from jbd2_journal_dirty_metadata() to avoid two stack traces from being displayed. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <jlbec@evilplan.org>
|
#
41a5b913 |
|
01-Jul-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: invalidate handle if jbd2_journal_restart() fails If jbd2_journal_restart() fails the handle will have been disconnected from the current transaction. In this situation, the handle must not be used for for any jbd2 function other than jbd2_journal_stop(). Enforce this with by treating a handle which has a NULL transaction pointer as an aborted handle, and issue a kernel warning if jbd2_journal_extent(), jbd2_journal_get_write_access(), jbd2_journal_dirty_metadata(), etc. is called with an invalid handle. This commit also fixes a bug where jbd2_journal_stop() would trip over a kernel jbd2 assertion check when trying to free an invalid handle. Also move the responsibility of setting current->journal_info to start_this_handle(), simplifying the three users of this function. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Younger Liu <younger.liu@huawei.com> Cc: Jan Kara <jack@suse.cz>
|
#
39c04153 |
|
01-Jul-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: fix theoretical race in jbd2__journal_restart Once we decrement transaction->t_updates, if this is the last handle holding the transaction from closing, and once we release the t_handle_lock spinlock, it's possible for the transaction to commit and be released. In practice with normal kernels, this probably won't happen, since the commit happens in a separate kernel thread and it's unlikely this could all happen within the space of a few CPU cycles. On the other hand, with a real-time kernel, this could potentially happen, so save the tid found in transaction->t_tid before we release t_handle_lock. It would require an insane configuration, such as one where the jbd2 thread was set to a very high real-time priority, perhaps because a high priority real-time thread is trying to read or write to a file system. But some people who use real-time kernels have been known to do insane things, including controlling laser-wielding industrial robots. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
9ff86446 |
|
12-Jun-2013 |
Dmitry Monakhov <dmonakhov@openvz.org> |
jbd2: optimize jbd2_journal_force_commit Current implementation of jbd2_journal_force_commit() is suboptimal because result in empty and useless commits. But callers just want to force and wait any unfinished commits. We already have jbd2_journal_force_commit_nested() which does exactly what we want, except we are guaranteed that we do not hold journal transaction open. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8f7d89f3 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: transaction reservation support In some cases we cannot start a transaction because of locking constraints and passing started transaction into those places is not handy either because we could block transaction commit for too long. Transaction reservation is designed to solve these issues. It reserves a handle with given number of credits in the journal and the handle can be later attached to the running transaction without blocking on commit or checkpointing. Reserved handles do not block transaction commit in any way, they only reduce maximum size of the running transaction (because we have to always be prepared to accomodate request for attaching reserved handle). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
fe1e8db5 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend() jbd2_journal_extend() first checked whether transaction can accept extending handle with more credits and then added credits to t_outstanding_credits. This can race with start_this_handle() adding another handle to a transaction and thus overbooking a transaction. Make jbd2_journal_extend() use atomic_add_return() to close the race. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
76c39904 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: cleanup needed free block estimates when starting a transaction __jbd2_log_space_left() and jbd_space_needed() were kind of odd. jbd_space_needed() accounted also credits needed for currently committing transaction while it didn't account for credits needed for control blocks. __jbd2_log_space_left() then accounted for control blocks as a fraction of free space. Since results of these two functions are always only compared against each other, this works correct but is somewhat strange. Move the estimates so that jbd_space_needed() returns number of blocks needed for a transaction including control blocks and __jbd2_log_space_left() returns free space in the journal (with the committing transaction already subtracted). Rename functions to jbd2_log_space_left() and jbd2_space_needed() while we are changing them. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
2f387f84 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: remove outdated comment The comment about credit estimates isn't true anymore. We do what the comment describes now. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
b34090e5 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: refine waiting for shadow buffers Currently when we add a buffer to a transaction, we wait until the buffer is removed from BJ_Shadow list (so that we prevent any changes to the buffer that is just written to the journal). This can take unnecessarily long as a lot happens between the time the buffer is submitted to the journal and the time when we remove the buffer from BJ_Shadow list. (e.g. We wait for all data buffers in the transaction, we issue a cache flush, etc.) Also this creates a dependency of do_get_write_access() on transaction commit (namely waiting for data IO to complete) which we want to avoid when implementing transaction reservation. So we modify commit code to set new BH_Shadow flag when temporary shadowing buffer is created and we clear that flag once IO on that buffer is complete. This allows do_get_write_access() to wait only for BH_Shadow bit and thus removes the dependency on data IO completion. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e5a120ae |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: remove journal_head from descriptor buffers Similarly as for metadata buffers, also log descriptor buffers don't really need the journal head. So strip it and remove BJ_LogCtl list. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f5113eff |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: don't create journal_head for temporary journal buffers When writing metadata to the journal, we create temporary buffer heads for that task. We also attach journal heads to these buffer heads but the only purpose of the journal heads is to keep buffers linked in transaction's BJ_IO list. We remove the need for journal heads by reusing buffer_head's b_assoc_buffers list for that purpose. Also since BJ_IO list is just a temporary list for transaction commit, we use a private list in jbd2_journal_commit_transaction() for that thus removing BJ_IO list from transaction completely. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
259709b0 |
|
21-May-2013 |
Lukas Czerner <lczerner@redhat.com> |
jbd2: change jbd2_journal_invalidatepage to accept length invalidatepage now accepts range to invalidate and there are two file system using jbd2 also implementing punch hole feature which can benefit from this. We need to implement the same thing for jbd2 layer in order to allow those file system take benefit of this functionality. This commit adds length argument to the jbd2_journal_invalidatepage() and updates all instances in ext4 and ocfs2. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
f783f091 |
|
21-Apr-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: trace when lock_buffer in do_get_write_access takes a long time While investigating interactivity problems it was clear that processes sometimes stall for long periods of times if an attempt is made to lock a buffer which is undergoing writeback. It would stall in a trace looking something like [<ffffffff811a39de>] __lock_buffer+0x2e/0x30 [<ffffffff8123a60f>] do_get_write_access+0x43f/0x4b0 [<ffffffff8123a7cb>] jbd2_journal_get_write_access+0x2b/0x50 [<ffffffff81220f79>] __ext4_journal_get_write_access+0x39/0x80 [<ffffffff811f3198>] ext4_reserve_inode_write+0x78/0xa0 [<ffffffff811f3209>] ext4_mark_inode_dirty+0x49/0x220 [<ffffffff811f57d1>] ext4_dirty_inode+0x41/0x60 [<ffffffff8119ac3e>] __mark_inode_dirty+0x4e/0x2d0 [<ffffffff8118b9b9>] update_time+0x79/0xc0 [<ffffffff8118ba98>] file_update_time+0x98/0x100 [<ffffffff81110ffc>] __generic_file_aio_write+0x17c/0x3b0 [<ffffffff811112aa>] generic_file_aio_write+0x7a/0xf0 [<ffffffff811ea853>] ext4_file_write+0x83/0xd0 [<ffffffff81172b23>] do_sync_write+0xa3/0xe0 [<ffffffff811731ae>] vfs_write+0xae/0x180 [<ffffffff8117361d>] sys_write+0x4d/0x90 [<ffffffff8159d62d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
28daf4fa |
|
19-Apr-2013 |
Zheng Liu <wenqing.lz@taobao.com> |
jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset The jbd2_alloc_handle() function is only called by new_handle(). So this commit uses kmem_cache_zalloc() instead of kmem_cache_alloc()/memset(). Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
ad56edad |
|
11-Mar-2013 |
Jan Kara <jack@suse.cz> |
jbd2: fix use after free in jbd2_journal_dirty_metadata() jbd2_journal_dirty_metadata() didn't get a reference to journal_head it was working with. This is OK in most of the cases since the journal head should be attached to a transaction but in rare occasions when we are journalling data, __ext4_journalled_writepage() can race with jbd2_journal_invalidatepage() stripping buffers from a page and thus journal head can be freed under hands of jbd2_journal_dirty_metadata(). Fix the problem by getting own journal head reference in jbd2_journal_dirty_metadata() (and also in jbd2_journal_set_triggers() which can possibly have the same issue). Reported-by: Zheng Liu <gnehzuil.liu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
df05c1b8 |
|
02-Mar-2013 |
Dmitry Monakhov <dmonakhov@openvz.org> |
jbd2: fix ERR_PTR dereference in jbd2__journal_start If start_this_handle() failed handle will be initialized to ERR_PTR() and can not be dereferenced. paging request at fffffffffffffff6 IP: [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 PGD 200e067 PUD 200f067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod CPU 0 journal commit I/O error Pid: 2694, comm: fio Not tainted 3.8.0-rc3+ #79 /DQ67SW RIP: 0010:[<ffffffff813c073f>] [<ffffffff813c073f>] jbd2__journal_start+0x18f/0x290 RSP: 0018:ffff880233b8ba58 EFLAGS: 00010292 RAX: 00000000ffffffe2 RBX: ffffffffffffffe2 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff82128f48 RBP: ffff880233b8ba98 R08: 0000000000000000 R09: ffff88021440a6e0 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
343d9c28 |
|
08-Feb-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: add tracepoints which provide per-handle statistics Handles which stay open a long time are problematic when it comes time to close down a transaction so it can be committed. These tracepoints will help us determine which ones are the problematic ones, and to validate whether changes makes things better or worse. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9fff24aa |
|
06-Feb-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: track request delay statistics Track the delay between when we first request that the commit begin and when it actually begins, so we can see how much of a gap exists. In theory, this should just be the remaining scheduling quantuum of the thread which requested the commit (assuming it was not a synchronous operation which triggered the commit request) plus scheduling overhead; however, it's possible that real time processes might get in the way of letting the kjournald thread from executing. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
53e87268 |
|
25-Dec-2012 |
Jan Kara <jack@suse.cz> |
ext4: fix deadlock in journal_unmap_buffer() We cannot wait for transaction commit in journal_unmap_buffer() because we hold page lock which ranks below transaction start. We solve the issue by bailing out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible for waiting for transaction commit to finish and try invalidation again. Since the issue can happen only for page stradding i_size, it is simple enough to manually call jbd2_journal_invalidatepage() for such page from ext4_setattr(), check the return value and wait if necessary. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
d7961c7f |
|
20-Dec-2012 |
Jan Kara <jack@suse.cz> |
jbd2: fix assertion failure in jbd2_journal_flush() The following race is possible between start_this_handle() and someone calling jbd2_journal_flush(). Process A Process B start_this_handle(). if (journal->j_barrier_count) # false if (!journal->j_running_transaction) { #true read_unlock(&journal->j_state_lock); jbd2_journal_lock_updates() jbd2_journal_flush() write_lock(&journal->j_state_lock); if (journal->j_running_transaction) { # false ... wait for committing trans ... write_unlock(&journal->j_state_lock); ... write_lock(&journal->j_state_lock); if (!journal->j_running_transaction) { # true jbd2_get_transaction(journal, new_transaction); write_unlock(&journal->j_state_lock); goto repeat; # eventually blocks on j_barrier_count > 0 ... J_ASSERT(!journal->j_running_transaction); # fails We fix the race by rechecking j_barrier_count after reacquiring j_state_lock in exclusive mode. Reported-by: yjwsignal@empal.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
48fc7f7e |
|
19-Sep-2012 |
Adam Buchbinder <adam.buchbinder@gmail.com> |
Fix misspellings of "whether" in comments. "Whether" is misspelled in various comments across the tree; this fixes them. No code changes. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
37be2f59 |
|
08-Nov-2012 |
Eric Sandeen <sandeen@redhat.com> |
ext4: remove ext4_handle_release_buffer() ext4_handle_release_buffer() was intended to remove journal write access from a buffer, but it doesn't actually do anything at all other than add a BUFFER_TRACE point, but it's not reliably used for that either. Remove all the associated dead code. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
b794e7a6 |
|
26-Sep-2012 |
Jan Kara <jack@suse.cz> |
jbd2: fix assertion failure in commit code due to lacking transaction credits ext4 users of data=journal mode with blocksize < pagesize were occasionally hitting assertion failure in jbd2_journal_commit_transaction() checking whether the transaction has at least as many credits reserved as buffers attached. The core of the problem is that when a file gets truncated, buffers that still need checkpointing or that are attached to the committing transaction are left with buffer_mapped set. When this happens to buffers beyond i_size attached to a page stradding i_size, subsequent write extending the file will see these buffers and as they are mapped (but underlying blocks were freed) things go awry from here. The assertion failure just coincidentally (and in this case luckily as we would start corrupting filesystem) triggers due to journal_head not being properly cleaned up as well. We fix the problem by unmapping buffers if possible (in lots of cases we just need a buffer attached to a transaction as a place holder but it must not be written out anyway). And in one case, we just have to bite the bullet and wait for transaction commit to finish. CC: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
b2f4edb3 |
|
31-May-2012 |
Wanlong Gao <gaowanlong@cn.fujitsu.com> |
jbd2: use kmem_cache_zalloc wrapper instead of flag Use kmem_cache_zalloc wrapper instead of flag __GFP_ZERO. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
303a8f2a |
|
25-Nov-2011 |
Cong Wang <amwang@redhat.com> |
jbd2: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>
|
#
c254c9ec |
|
13-Mar-2012 |
Jan Kara <jack@suse.cz> |
jbd2: remove always true condition in __journal_try_to_free_buffer() The check b_jlist == BJ_None in __journal_try_to_free_buffer() is always true (__jbd2_journal_temp_unlink_buffer() also checks this in an assertion) so just remove it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
5bebccf9 |
|
13-Mar-2012 |
Jan Kara <jack@suse.cz> |
jbd2: declare __jbd2_journal_temp_unlink_buffer() static Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
0c2022ec |
|
20-Feb-2012 |
Yongqiang Yang <xiaoqiangnk@gmail.com> |
jbd2: allocate transaction from separate slab cache There is normally only a handful of these active at any one time, but putting them in a separate slab cache makes debugging memory corruption problems easier. Manish Katiyar also wanted this make it easier to test memory failure scenarios in the jbd2 layer. Cc: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
15291164 |
|
20-Feb-2012 |
Eric Sandeen <sandeen@redhat.com> |
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head state ala discard_buffer(), but does not touch _Delay or _Unwritten as discard_buffer() does. This can be problematic in some areas of the ext4 code which assume that if they have found a buffer marked unwritten or delay, then it's a live one. Perhaps those spots should check whether it is mapped as well, but if jbd2 is going to tear down a buffer, let's really tear it down completely. Without this I get some fsx failures on sub-page-block filesystems up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go away, because buried within that large change is some more flag clearing. I still think it's worth doing in jbd2, since ->invalidatepage leads here directly, and it's the right place to clear away these flags. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
9837d8e9 |
|
04-Jan-2012 |
Jan Kara <jack@suse.cz> |
jbd2: fix hung processes in jbd2_journal_lock_updates() Toshiyuki Okajima found out that when running for ((i=0; i < 100000; i++)); do if ((i%2 == 0)); then chattr +j /mnt/file else chattr -j /mnt/file fi echo "0" >> /mnt/file done process sometimes hangs indefinitely in jbd2_journal_lock_updates(). Toshiyuki identified that the following race happens: jbd2_journal_lock_updates() |jbd2_journal_stop() ---------------------------------------+--------------------------------------- write_lock(&journal->j_state_lock) | . ++journal->j_barrier_count | . spin_lock(&tran->t_handle_lock) | . atomic_read(&tran->t_updates) //not 0 | | atomic_dec_and_test(&tran->t_updates) | // t_updates = 0 | wake_up(&journal->j_wait_updates) prepare_to_wait() | // no process is woken up. spin_unlock(&tran->t_handle_lock) | write_unlock(&journal->j_state_lock) | schedule() // never return | We fix the problem by first calling prepare_to_wait() and only after that checking t_updates in jbd2_journal_lock_updates(). Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f2a44523 |
|
01-Nov-2011 |
Eryu Guan <guaneryu@gmail.com> |
jbd2: Unify log messages in jbd2 code Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the same time other jbd2 code prints with "JBD: " prefix. Unify the prefix to "JBD2: ". Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
44705754 |
|
27-Oct-2011 |
Randy Dunlap <rdunlap@infradead.org> |
jbd2: fix build when CONFIG_BUG is not enabled Fix build error when CONFIG_BUG is not enabled: fs/jbd2/transaction.c:1175:3: error: implicit declaration of function '__WARN' by changing __WARN() to WARN_ON(), as suggested by Arnaud Lacombe <lacombar@gmail.com>. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arnaud Lacombe <lacombar@gmail.com>
|
#
d2159fb7 |
|
04-Sep-2011 |
Dan Carpenter <error27@gmail.com> |
jbd2: use gfp_t instead of int This silences some Sparse warnings: fs/jbd2/transaction.c:135:69: warning: incorrect type in argument 2 (different base types) fs/jbd2/transaction.c:135:69: expected restricted gfp_t [usertype] flags fs/jbd2/transaction.c:135:69: got int [signed] gfp_mask Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9ea7a0df |
|
04-Sep-2011 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: add debugging information to jbd2_journal_dirty_metadata() Add debugging information in case jbd2_journal_dirty_metadata() is called with a buffer_head which didn't have jbd2_journal_get_write_access() called on it, or if the journal_head has the wrong transaction in it. In addition, return an error code. This won't change anything for ocfs2, which will BUG_ON() the non-zero exit code. For ext4, the caller of this function is ext4_handle_dirty_metadata(), and on seeing a non-zero return code, will call __ext4_journal_stop(), which will print the function and line number of the (buggy) calling function and abort the journal. This will allow us to recover instead of bug halting, which is better from a robustness and reliability point of view. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
de1b7941 |
|
13-Jun-2011 |
Jan Kara <jack@suse.cz> |
jbd2: Fix oops in jbd2_journal_remove_journal_head() jbd2_journal_remove_journal_head() can oops when trying to access journal_head returned by bh2jh(). This is caused for example by the following race: TASK1 TASK2 jbd2_journal_commit_transaction() ... processing t_forget list __jbd2_journal_refile_buffer(jh); if (!jh->b_transaction) { jbd_unlock_bh_state(bh); jbd2_journal_try_to_free_buffers() jbd2_journal_grab_journal_head(bh) jbd_lock_bh_state(bh) __journal_try_to_free_buffer() jbd2_journal_put_journal_head(jh) jbd2_journal_remove_journal_head(bh); jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not part of any transaction and thus frees journal_head before TASK1 gets to doing so. Note that even buffer_head can be released by try_to_free_buffers() after jbd2_journal_put_journal_head() which adds even larger opportunity for oops (but I didn't see this happen in reality). Fix the problem by making transactions hold their own journal_head reference (in b_jcount). That way we don't have to remove journal_head explicitely via jbd2_journal_remove_journal_head() and instead just remove journal_head when b_jcount drops to zero. The result of this is that [__]jbd2_journal_refile_buffer(), [__]jbd2_journal_unfile_buffer(), and __jdb2_journal_remove_checkpoint() can free journal_head which needs modification of a few callers. Also we have to be careful because once journal_head is removed, buffer_head might be freed as well. So we have to get our own buffer_head reference where it matters. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
1fb74cda |
|
12-Jun-2011 |
Tao Ma <boyu.mt@taobao.com> |
jbd2: Remove obsolete parameters in the comments for some jbd2 functions credits isn't a parameter for jbd2_journal_get_write_access and jbd2_journal_get_undo_access. So remove the corresponding comments. Acked-by: Jan Kara <jack@suse.cz> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
3991b400 |
|
25-May-2011 |
Ding Dinghua <dingdinghua@nrchpc.ac.cn> |
jbd2: fix a potential leak of a journal_head on an error path drop jh->b_jcount in error path Signed-off-by: Ding Dinghua <dingdinghua@nrchpc.ac.cn> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
c867516d |
|
24-May-2011 |
Eryu Guan <guaneryu@gmail.com> |
jbd2: Fix comment to match the code in jbd2__journal_start() jbd2__journal_start() returns an ERR_PTR() value rather than NULL on failure. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
81be12c8 |
|
24-May-2011 |
Jan Kara <jack@suse.cz> |
jbd2: fix sending of data flush on journal commit In data=ordered mode, it's theoretically possible (however rare) that an inode is filed to transaction's t_inode_list and a flusher thread writes all the data and inode is reclaimed before the transaction starts to commit. In such a case, we could erroneously omit sending a flush to file system device when it is different from the journal device (because data can still be in disk cache only). Fix the problem by setting a flag in a transaction when some inode is added to it and then send disk flush in the commit code when the flag is set. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
28e35e42 |
|
22-May-2011 |
Tao Ma <boyu.mt@taobao.com> |
jbd2: Fix the wrong calculation of t_max_wait in update_t_max_wait t_max_wait is added in commit 8e85fb3f to indicate how long we were waiting for new transaction to start. In commit 6d0bf005, it is moved to another function named update_t_max_wait to avoid a build warning. But the wrong thing is that the original 'ts' is initialized in the start of function start_this_handle and we can calculate t_max_wait in the right way. while with this change, ts is initialized within the function and t_max_wait can never be calculated right. This patch moves the initialization of ts to the original beginning of start_this_handle and pass it to function update_t_max_wait so that it can be calculated right and the build warning is avoided also. Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
e4471831 |
|
12-Feb-2011 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: call __jbd2_log_start_commit with j_state_lock write locked On an SMP ARM system running ext4, I've received a report that the first J_ASSERT in jbd2_journal_commit_transaction has been triggering: J_ASSERT(journal->j_running_transaction != NULL); While investigating possible causes for this problem, I noticed that __jbd2_log_start_commit() is getting called with j_state_lock only read-locked, in spite of the fact that it's possible for it might j_commit_request. Fix this by grabbing the necessary information so we can test to see if we need to start a new transaction before dropping the read lock, and then calling jbd2_log_start_commit() which will grab the write lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
ae00b267 |
|
18-Dec-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: remove unnecessary goto statement This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
a1dd5331 |
|
18-Dec-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: use offset_in_page() instead of manual calculation This is a port to jbd2 of a patch which Namhyung Kim <namhyung@gmail.com> originally made to fs/jbd. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
cfef2c6a |
|
18-Dec-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Fix a debug message in do_get_write_access() 'buffer_head' should be 'journal_head' This is a port of a patch which Namhyung Kim <namhyung@gmail.com> made to fs/jbd to jbd2. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
a34f0b31 |
|
10-Dec-2010 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
fix comment typos concerning "consistent" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
5c2178e7 |
|
27-Oct-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Add sanity check for attempts to start handle during umount An attempt to modify the file system during the call to jbd2_destroy_journal() can lead to a system lockup. So add some checking to make it much more obvious when this happens to and to determine where the offending code is located. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
6d0bf005 |
|
09-Aug-2010 |
Theodore Ts'o <tytso@mit.edu> |
ext4: clean up compiler warning in start_this_handle() Fix the compiler warning: fs/jbd2/transaction.c: In function ‘start_this_handle’: fs/jbd2/transaction.c:98: warning: unused variable ‘ts’ Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8dd42046 |
|
03-Aug-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Remove t_handle_lock from start_this_handle() This should remove the last exclusive lock from start_this_handle(), so that we should now be able to start multiple transactions at the same time on large SMP systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
a931da6a |
|
03-Aug-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Change j_state_lock to be a rwlock_t Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
a51dca9c |
|
02-Aug-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Use atomic variables to avoid taking t_handle_lock in jbd2_journal_stop By using an atomic_t for t_updates and t_outstanding credits, this should allow us to not need to take transaction t_handle_lock in jbd2_journal_stop(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
47def826 |
|
27-Jul-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Remove __GFP_NOFAIL from jbd2 layer __GFP_NOFAIL is going away, so add our own retry loop. Also add jbd2__journal_start() and jbd2__journal_restart() which take a gfp mask, so that file systems can optionally (re)start transaction handles using GFP_KERNEL. If they do this, then they need to be prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready to reflect that error up to userspace. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
13ceef09 |
|
13-Jul-2010 |
Jan Kara <jack@suse.cz> |
jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
c35a56a0 |
|
16-May-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() One of the most contended locks in the jbd2 layer is j_state_lock when running dbench. This is especially true if using the real-time kernel with its "sleeping spinlocks" patch that replaces spinlocks with priority inheriting mutexes --- but it also shows up on large SMP benchmarks. Thanks to John Stultz for pointing this out. Reviewed by Mingming Cao and Jan Kara. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
ba869023 |
|
15-Feb-2010 |
dingdinghua <dingdinghua@nrchpc.ac.cn> |
jbd2: delay discarding buffers in journal_unmap_buffer Delay discarding buffers in journal_unmap_buffer until we know that "add to orphan" operation has definitely been committed, otherwise the log space of committing transation may be freed and reused before truncate get committed, updates may get lost if crash happens. Signed-off-by: dingdinghua <dingdinghua@nrchpc.ac.cn> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9599b0e5 |
|
17-Aug-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Annotate transaction start also for jbd2_journal_restart() lockdep annotation for a transaction start has been at the end of jbd2_journal_start(). But a transaction is also started from jbd2_journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
b1f485f2 |
|
10-Aug-2009 |
Andreas Dilger <adilger@sun.com> |
jbd2: round commit timer up to avoid uncommitted transaction fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or expicit sync/umount happens. Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com> Signed-off-by: Andreas Dilger <adilger@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
536fc240 |
|
17-Jun-2009 |
Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> |
jbd2: clean up jbd2_journal_try_to_free_buffers() This patch reverts 3f31fddf, which is no longer needed because if a race between freeing buffer and committing transaction functionality occurs and dio gets error, currently dio falls back to buffered IO due to the commit 6ccfa806. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f91d1d04 |
|
13-Jul-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Fix a race between checkpointing code and journal_get_write_access() The following race can happen: CPU1 CPU2 checkpointing code checks the buffer, adds it to an array for writeback do_get_write_access() ... lock_buffer() unlock_buffer() flush_batch() submits the buffer for IO __jbd2_journal_file_buffer() So a buffer under writeout is returned from do_get_write_access(). Since the filesystem code relies on the fact that journaled buffers cannot be written out, it does not take the buffer lock and so it can modify buffer while it is under writeout. That can lead to a filesystem corruption if we crash at the right moment. We fix the problem by clearing the buffer dirty bit under buffer_lock even if the buffer is on BJ_None list. Actually, we clear the dirty bit regardless the list the buffer is in and warn about the fact if the buffer is already journalled. Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>. Reported-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
7058548c |
|
25-Mar-2009 |
Theodore Ts'o <tytso@mit.edu> |
ext4: Use WRITE_SYNC for commits which are caused by fsync() If a commit is triggered by fsync(), set a flag indicating the journal blocks associated with the transaction should be flushed out using WRITE_SYNC. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
7f5aa215 |
|
10-Feb-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com>
|
#
e06c8227 |
|
11-Sep-2008 |
Joel Becker <joel.becker@oracle.com> |
jbd2: Add buffer triggers Filesystems often to do compute intensive operation on some metadata. If this operation is repeated many times, it can be very expensive. It would be much nicer if the operation could be performed once before a buffer goes to disk. This adds triggers to jbd2 buffer heads. Just before writing a metadata buffer to the journal, jbd2 will optionally call a commit trigger associated with the buffer. If the journal is aborted, an abort trigger will be called on any dirty buffers as they are dropped from pending transactions. ocfs2 will use this feature. Initially I tried to come up with a more generic trigger that could be used for non-buffer-related events like transaction completion. It doesn't tie nicely, because the information a buffer trigger needs (specific to a journal_head) isn't the same as what a transaction trigger needs (specific to a tranaction_t or perhaps journal_t). So I implemented a buffer set, with the understanding that journal/transaction wide triggers should be implemented separately. There is only one trigger set allowed per buffer. I can't think of any reason to attach more than one set. Contrast this with a journal or transaction in which multiple places may want to watch the entire transaction separately. The trigger sets are considered static allocation from the jbd2 perspective. ocfs2 will just have one trigger set per block type, setting the same set on every bh of the same type. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
30773840 |
|
03-Jan-2009 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add fsync batch tuning knobs Add new mount options, min_batch_time and max_batch_time, which controls how long the jbd2 layer should wait for additional filesystem operations to get batched with a synchronous write transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e07f7183 |
|
25-Nov-2008 |
Josef Bacik <jbacik@redhat.com> |
jbd2: improve jbd2 fsync batching This patch removes the static sleep time in favor of a more self optimizing approach where we measure the average amount of time it takes to commit a transaction to disk and the ammount of time a transaction has been running. If somebody does a sync write or an fsync() traditionally we would sleep for 1 jiffies, which depending on the value of HZ could be a significant amount of time compared to how long it takes to commit a transaction to the underlying storage. With this patch instead of sleeping for a jiffie, we check to see if the amount of time this transaction has been running is less than the average commit time, and if it is we sleep for the delta using schedule_hrtimeout to give us a higher precision sleep time. This greatly benefits high end storage where you could end up sleeping for longer than it takes to commit the transaction and therefore sitting idle instead of allowing the transaction to be committed by keeping the sleep time to a minimum so you are sure to always be doing something. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
3e624fc7 |
|
16-Oct-2008 |
Theodore Ts'o <tytso@mit.edu> |
ext4: Replace hackish ext4_mb_poll_new_transaction with commit callback The multiblock allocator needs to be able to release blocks (and issue a blkdev discard request) when the transaction which freed those blocks is committed. Previously this was done via a polling mechanism when blocks are allocated or freed. A much better way of doing things is to create a jbd2 callback function and attaching the list of blocks to be freed directly to the transaction structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
3295f0ef |
|
11-Aug-2008 |
Ingo Molnar <mingo@elte.hu> |
lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]() the names were too generic: drivers/uio/uio.c:87: error: expected identifier or '(' before 'do' drivers/uio/uio.c:87: error: expected identifier or '(' before 'while' drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function) Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
4f3e7524 |
|
11-Aug-2008 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
lockdep: map_acquire Most the free-standing lock_acquire() usages look remarkably similar, sweep them into a new helper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
#
87c89c23 |
|
11-Jul-2008 |
Jan Kara <jack@suse.cz> |
jbd2: Remove data=ordered mode support using jbd buffer heads Signed-off-by: Jan Kara <jack@suse.cz>
|
#
c851ed54 |
|
11-Jul-2008 |
Jan Kara <jack@suse.cz> |
jbd2: Implement data=ordered mode handling via inodes This patch adds necessary framework into JBD2 to be able to track inodes with each transaction and write-out their dirty data during transaction commit time. This new ordered mode brings all sorts of advantages such as possibility to get rid of journal heads and buffer heads for data buffers in ordered mode, better ordering of writes on transaction commit, simplification of some JBD code, no more anonymous pages when truncate of data being committed happens. Also with this new ordered mode, delayed allocation on ordered mode is much simpler. Signed-off-by: Jan Kara <jack@suse.cz>
|
#
530576bb |
|
13-Jul-2008 |
Mingming Cao <cmm@us.ibm.com> |
jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction journal_try_to_free_buffers() could race with jbd commit transaction when the later is holding the buffer reference while waiting for the data buffer to flush to disk. If the caller of journal_try_to_free_buffers() request tries hard to release the buffers, it will treat the failure as error and return back to the caller. We have seen the directo IO failed due to this race. Some of the caller of releasepage() also expecting the buffer to be dropped when passed with GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers(). With this patch, if the caller is passing the GFP_KERNEL to indicating this call could wait, in case of try_to_free_buffers() failed, let's waiting for journal_commit_transaction() to finish commit the current committing transaction , then try to free those buffers again with journal locked. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
329d291f |
|
17-Apr-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
jdb2: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
5648ba5b |
|
17-Apr-2008 |
Randy Dunlap <randy.dunlap@oracle.com> |
jbd2: fix kernel-doc notation Fix kernel-doc notation in jbd2. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
1dfc3220 |
|
17-Apr-2008 |
Josef Bacik <jbacik@redhat.com> |
jbd2: fix possible journal overflow issues There are several cases where the running transaction can get buffers added to its BJ_Metadata list which it never dirtied, which makes its t_nr_buffers counter end up larger than its t_outstanding_credits counter. This will cause issues when starting new transactions as while we are logging buffers we decrement t_outstanding_buffers, so when t_outstanding_buffers goes negative, we will report that we need less space in the journal than we actually need, so transactions will be started even though there may not be enough room for them. In the worst case scenario (which admittedly is almost impossible to reproduce) this will result in the journal running out of space. The fix is to only refile buffers from the committing transaction to the running transactions BJ_Modified list when b_modified is set on that journal, which is the only way to be sure if the running transaction has modified that buffer. This patch also fixes an accounting error in journal_forget, it is possible that we can call journal_forget on a buffer without having modified it, only gotten write access to it, so instead of freeing a credit, we only do so if the buffer was modified. The assert will help catch if this problem occurs. Without these two patches I could hit this assert within minutes of running postmark, with them this issue no longer arises. Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9fc7c63a |
|
17-Apr-2008 |
Josef Bacik <jbacik@redhat.com> |
jbd2: fix the way the b_modified flag is cleared Currently at the start of a journal commit we loop through all of the buffers on the committing transaction and clear the b_modified flag (the flag that is set when a transaction modifies the buffer) under the j_list_lock. The problem is that everywhere else this flag is modified only under the jbd2 lock buffer flag, so it will race with a running transaction who could potentially set it, and have it unset by the committing transaction. This is also a big waste, you can have several thousands of buffers that you are clearing the modified flag on when you may not need to. This patch removes this code and instead clears the b_modified flag upon entering do_get_write_access/journal_get_create_access, so if that transaction does indeed use the buffer then it will be accounted for properly, and if it does not then we know we didn't use it. That will be important for the next patch in this series. Tested thoroughly by myself using postmark/iozone/bonnie++. Cc: <linux-ext4@vger.kernel.org> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
4019191b |
|
28-Jan-2008 |
Mingming Cao <cmm@us.ibm.com> |
jbd2: sparse pointer use of zero as null Get rid of sparse related warnings from places that use integer as NULL pointer. (Ported from upstream ext3/jbd changes.) Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
db857da3 |
|
28-Jan-2008 |
Mingming Cao <cmm@us.ibm.com> |
jbd2: Use round-jiffies() function for the "5 second" ext4/jbd2 wakeup While "every 5 seconds" doesn't sound as a problem, there can be many of these (and these timers do add up over all the kernel). The "5 second" wakeup isn't really timing sensitive; in addition even with rounding it'll still happen every 5 seconds (with the exception of the very first time, which is likely to be rounded up to somewhere closer to 6 seconds) (Ported from similar JBD patch made by Arjan van de Ven to fs/jbd/transaction.c) Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
7b751066 |
|
28-Jan-2008 |
Mingming Cao <cmm@us.ibm.com> |
jbd2: add lockdep support Ported from similar patch for the jbd layer. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8e85fb3f |
|
28-Jan-2008 |
Johann Lombardi <johann.lombardi@bull.net> |
jbd2: jbd2 stats through procfs The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4&m=113538565128617&w=2). It provides statistics via procfs such as transaction lifetime and size. Sometimes, investigating performance problems, i find useful to have stats from jbd about transaction's lifetime, size, etc. here is a patch for review and inclusion probably. for example, stats after creation of 3M files in htree directory: [root@bob ~]# cat /proc/fs/jbd/sda/history R/C tid wait run lock flush log hndls block inlog ctime write drop close R 261 8260 2720 0 0 750 9892 8170 8187 C 259 750 0 4885 1 R 262 20 2200 10 0 770 9836 8170 8187 R 263 30 2200 10 0 3070 9812 8170 8187 R 264 0 5000 10 0 1340 0 0 0 C 261 8240 3212 4957 0 R 265 8260 1470 0 0 4640 9854 8170 8187 R 266 0 5000 10 0 1460 0 0 0 C 262 8210 2989 4868 0 R 267 8230 1490 10 0 4440 9875 8171 8188 R 268 0 5000 10 0 1260 0 0 0 C 263 7710 2937 4908 0 R 269 7730 1470 10 0 3330 9841 8170 8187 R 270 0 5000 10 0 830 0 0 0 C 265 8140 3234 4898 0 C 267 720 0 4849 1 R 271 8630 2740 20 0 740 9819 8170 8187 C 269 800 0 4214 1 R 272 40 2170 10 0 830 9716 8170 8187 R 273 40 2280 0 0 3530 9799 8170 8187 R 274 0 5000 10 0 990 0 0 0 where, R - line for transaction's life from T_RUNNING to T_FINISHED C - line for transaction's checkpointing tid - transaction's id wait - for how long we were waiting for new transaction to start (the longest period journal_start() took in this transaction) run - real transaction's lifetime (from T_RUNNING to T_LOCKED lock - how long we were waiting for all handles to close (time the transaction was in T_LOCKED) flush - how long it took to flush all data (data=ordered) log - how long it took to write the transaction to the log hndls - how many handles got to the transaction block - how many blocks got to the transaction inlog - how many blocks are written to the log (block + descriptors) ctime - how long it took to checkpoint the transaction write - how many blocks have been written during checkpointing drop - how many blocks have been dropped during checkpointing close - how many running transactions have been closed to checkpoint this one all times are in msec. [root@bob ~]# cat /proc/fs/jbd/sda/info 280 transaction, each upto 8192 blocks average: 1633ms waiting for transaction 3616ms running transaction 5ms transaction was being locked 1ms flushing data (in ordered mode) 1799ms logging transaction 11781 handles per transaction 5629 blocks per transaction 5641 logged blocks per transaction Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
|
#
d802ffa8 |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
JBD2/Ext4: Convert kmalloc to kzalloc in jbd2/ext4 Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
2d917969 |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
JBD2: replace jbd_kmalloc with kmalloc directly. This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
af1e76d6 |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
JBD2: jbd2 slab allocation cleanups JBD2: Replace slab allocations with page allocations JBD2 allocate memory for committed_data and frozen_data from slab. However JBD2 should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
58862699 |
|
08-May-2007 |
Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> |
fix file specification in comments Many files include the filename at the beginning, serveral used a wrong one. Signed-off-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
#
e63340ae |
|
08-May-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7ddae860 |
|
06-Dec-2006 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] make fs/jbd2/transaction.c:__kbd2_journal_temp_unlink_buffer() static Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
9b57988d |
|
28-Oct-2006 |
Eric Sandeen <sandeen@sandeen.net> |
[PATCH] jbd2: journal_dirty_data re-check for unmapped buffers When running several fsx's and other filesystem stress tests, we found cases where an unmapped buffer was still being sent to submit_bh by the ext3 dirty data journaling code. I saw this happen in two ways, both related to another thread doing a truncate which would unmap the buffer in question. Either we would get into journal_dirty_data with a bh which was already unmapped (although journal_dirty_data_fn had checked for this earlier, the state was not locked at that point), or it would get unmapped in the middle of journal_dirty_data when we dropped locks to call sync_dirty_buffer. By re-checking for mapped state after we've acquired the bh state lock, we should avoid these races. If we find a buffer which is no longer mapped, we essentially ignore it, because journal_unmap_buffer has already decided that this buffer can go away. I've also added tracepoints in these two cases, and made a couple other tracepoint changes that I found useful in debugging this. Signed-off-by: Eric Sandeen <esandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
3e2a532b |
|
20-Oct-2006 |
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> |
[PATCH] ext3/4: fix J_ASSERT(transaction->t_updates > 0) in journal_stop() A disk generated some I/O error, after it, I hitted J_ASSERT(transaction->t_updates > 0) in journal_stop(). It seems to happened on ext3_truncate() path from stack trace. Then, maybe the following case may trigger J_ASSERT(transaction->t_updates > 0). ext3_truncate() -> ext3_free_branches() -> ext3_journal_test_restart() -> ext3_journal_restart() -> journal_restart() transaction->t_updates--; /* another process aborted journal */ -> start_this_handle() returns -EROFS without transaction->t_updates++; -> ext3_journal_stop() -> journal_stop() J_ASSERT(transaction->t_updates > 0) If journal was aborted in middle of journal_restart(), ext3_truncate() may trigger J_ASSERT(). Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f7f4bccb |
|
11-Oct-2006 |
Mingming Cao <cmm@us.ibm.com> |
[PATCH] jbd2: rename jbd2 symbols to avoid duplication of jbd symbols Mingming Cao originally did this work, and Shaggy reproduced it using some scripts from her. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
470decc6 |
|
11-Oct-2006 |
Dave Kleikamp <shaggy@austin.ibm.com> |
[PATCH] jbd2: initial copy of files from jbd This is a simple copy of the files in fs/jbd to fs/jbd2 and /usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|