#
62ec1707 |
|
12-Dec-2023 |
Zhihao Cheng <chengzhihao1@huawei.com> |
jbd2: replace journal state flag by checking errseq Now JBD2 detects metadata writeback error of fs dev according to errseq. Replace journal state flag by checking errseq. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
990b6b5b |
|
12-Dec-2023 |
Zhihao Cheng <chengzhihao1@huawei.com> |
jbd2: add errseq to detect client fs's bdev writeback error Add errseq in journal, so that JBD2 can detect whether metadata is successfully written to fs bdev. This patch adds detection in recovery process to replace original solution(using local variable wb_err). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6a3afb6a |
|
29-Nov-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: increase the journal IO's priority Current jbd2 only add REQ_SYNC for descriptor block, metadata log buffer, commit buffer and superblock buffer, the submitted IO could be throttled by writeback throttle in block layer, that could lead to priority inversion in some cases. The log IO looks like a kind of high priority metadata IO, so it should not be throttled by WBT like QOS policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from writeback throttle, and also add REQ_META together indicates it's a metadata IO. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
85559227 |
|
29-Nov-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: correct the printing of write_flags in jbd2_write_superblock() The write_flags print in the trace of jbd2_write_superblock() is not real, so move the modification before the trace. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
4b403dfa |
|
11-Sep-2023 |
Qi Zheng <zhengqi.arch@bytedance.com> |
jbd2,ext4: dynamically allocate the jbd2-journal shrinker In preparation for implementing lockless slab shrink, use new APIs to dynamically allocate the jbd2-journal shrinker, so that it can be freed asynchronously via RCU. Then it doesn't need to wait for RCU read-side critical section when releasing the struct journal_s. Link: https://lkml.kernel.org/r/20230911094444.68966-32-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
1bb0763f |
|
10-Sep-2023 |
Li Zetao <lizetao1@huawei.com> |
jbd2: Fix memory leak in journal_init_common() There is a memory leak reported by kmemleak: unreferenced object 0xff11000105903b80 (size 64): comm "mount", pid 3382, jiffies 4295032021 (age 27.826s) hex dump (first 32 bytes): 04 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ ff ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffae86ac40>] __kmalloc_node+0x50/0x160 [<ffffffffaf2486d8>] crypto_alloc_tfmmem.isra.0+0x38/0x110 [<ffffffffaf2498e5>] crypto_create_tfm_node+0x85/0x2f0 [<ffffffffaf24a92c>] crypto_alloc_tfm_node+0xfc/0x210 [<ffffffffaedde777>] journal_init_common+0x727/0x1ad0 [<ffffffffaede1715>] jbd2_journal_init_inode+0x2b5/0x500 [<ffffffffaed786b5>] ext4_load_and_init_journal+0x255/0x2440 [<ffffffffaed8b423>] ext4_fill_super+0x8823/0xa330 ... The root cause was traced to an error handing path in journal_init_common() when malloc memory failed in register_shrinker(). The checksum driver is used to reference to checksum algorithm via cryptoapi and the user should release the memory when the driver is no longer needed or the journal initialization failed. Fix it by calling crypto_free_shash() on the "err_cleanup" error handing path in journal_init_common(). Fixes: c30713084ba5 ("jbd2: move load_superblock() into journal_init_common()") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/20230911025138.983101-1-lizetao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
8e6cf5fb |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: jbd2_journal_init_{dev,inode} return proper error return value Current jbd2_journal_init_{dev,inode} return NULL if some error happens, make them to pass out proper error return value. [ Fix from Yang Yingliang folded in. ] Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-11-yi.zhang@huaweicloud.com Link: https://lore.kernel.org/r/20230822030018.644419-1-yangyingliang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d9a45496 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: drop useless error tag in jbd2_journal_wipe() no_recovery is redundant, just drop it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
49887e47 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: cleanup journal_init_common() Adjust the initialization sequence and error handle of journal_t, moving load superblock to the begin, and classify others initialization. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0dbc759a |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: add fast_commit space check If JBD2_FEATURE_INCOMPAT_FAST_COMMIT bit is set, it means the journal have fast commit records need to recover, so the fast commit size should not be too large, and the leftover normal journal size should never less than JBD2_MIN_JOURNAL_BLOCKS. If it happens, the journal->j_last is likely to be wrong and will probably lead to incorrect journal recovery. So add a check into the journal_check_superblock(), and drop the pointless check when initializing the fastcommit parameters. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
054d9c8f |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: cleanup load_superblock() Rename load_superblock() to journal_load_superblock(), move getting and reading superblock from journal_init_common() and journal_get_superblock() to this function, and also rename journal_get_superblock() to journal_check_superblock(), make it a pure check helper to check superblock validity from disk. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
18dad509 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: open code jbd2_verify_csum_type() helper jbd2_verify_csum_type() helper check checksum type in the superblock for v2 or v3 checksum feature, it always return true if these features are not enabled, and it has only one user, so open code it is more clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
e4adf8b8 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: checking valid features early in journal_get_superblock() journal_get_superblock() is used to check validity of the jounal supberblock, so move the features checks from jbd2_journal_load() to journal_get_superblock(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
9600f3e5 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: don't load superblock in jbd2_journal_check_used_features() Since load_superblock() has been moved to journal_init_common(), the in-memory superblock structure is initialized and contains valid data once the file system has a journal_t object, so it's safe to access it, let's drop the call to journal_get_superblock() from jbd2_journal_check_used_features() and also drop the setting/clearing of the veirfy bit of the superblock buffer. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
c3071308 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: move load_superblock() into journal_init_common() Move the call to load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() early into journal_init_common(), the journal superblock gets read and the in-memory journal_t structure gets initialised after calling jbd2_journal_init_{dev,inode}, it's safe to do following initialization according to it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
29a511e4 |
|
11-Aug-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: move load_superblock() dependent functions Move load_superblock() declaration and the functions it calls before journal_init_common(). This is a preparation for moving a call to load_superblock() from jbd2_journal_load() and jbd2_journal_wipe() to journal_init_common(). No functional changes. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230811063610.2980059-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
1d401650 |
|
13-Apr-2023 |
Guoqing Cai <u202112087@hust.edu.cn> |
fs: jbd2: fix an incorrect warn log In jbd2_journal_load(), when journal_reset fails, it prints an incorrect warn log. Fix this by changing the goto statement to return statement. Also, return actual error code from jbd2_journal_recover() and journal_reset(). Signed-off-by: Guoqing Cai <u202112087@hust.edu.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230413095740.2222066-1-u202112087@hust.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
8147c4c4 |
|
12-Jul-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
jbd2: use a folio in jbd2_journal_write_metadata_buffer() The primary goal here is removing the use of set_bh_page(). Take the opportunity to switch from kmap_atomic() to kmap_local(). This simplifies the function as the offset is already added to the pointer. Link: https://lkml.kernel.org/r/20230713035512.4139457-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Sterba <dsterba@suse.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Tom Rix <trix@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
31464ab0 |
|
15-Jun-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: skip reading super block if it has been verified We got a NULL pointer dereference issue below while running generic/475 I/O failure pressure test. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 RIP: 0010:jbd2_journal_set_features+0x13d/0x430 ... Call Trace: <TASK> ? __die+0x23/0x60 ? page_fault_oops+0xa4/0x170 ? exc_page_fault+0x67/0x170 ? asm_exc_page_fault+0x26/0x30 ? jbd2_journal_set_features+0x13d/0x430 jbd2_journal_revoke+0x47/0x1e0 __ext4_forget+0xc3/0x1b0 ext4_free_blocks+0x214/0x2f0 ext4_free_branches+0xeb/0x270 ext4_ind_truncate+0x2bf/0x320 ext4_truncate+0x1e4/0x490 ext4_handle_inode_extension+0x1bd/0x2a0 ? iomap_dio_complete+0xaf/0x1d0 The root cause is the journal super block had been failed to write out due to I/O fault injection, it's uptodate bit was cleared by end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock(). And it raced by journal_get_superblock()->bh_read(), unfortunately, the read IO is also failed, so the error handling in journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer, finally lead to above NULL pointer dereference issue. If the journal super block had been read and verified, there is no need to call bh_read() read it again even if it has been failed to written out. So the fix could be simply move buffer_verified(bh) in front of bh_read(). Also remove a stale comment left in jbd2_journal_check_used_features(). Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()") Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616015547.3155195-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
c7fc6055 |
|
21-Mar-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: continue to record log between each mount For a newly mounted file system, the journal committing thread always record new transactions from the start of the journal area, no matter whether the journal was clean or just has been recovered. So the logdump code in debugfs cannot dump continuous logs between each mount, it is disadvantageous to analysis corrupted file system image and locate the file system inconsistency bugs. If we get a corrupted file system in the running products and want to find out what has happened, besides lookup the system log, one effective way is to backtrack the journal log. But we may not always run e2fsck before each mount and the default fsck -a mode also cannot always checkout all inconsistencies, so it could left over some inconsistencies into the next mount until we detect it. Finally, transactions in the journal may probably discontinuous and some relatively new transactions has been covered, it becomes hard to analyse. If we could record transactions continuously between each mount, we could acquire more useful info from the journal. Like this: |Previous mount checkpointed/recovered logs|Current mount logs | |{------}{---}{--------} ... {------}| ... |{======}{========}...000000| And yes the journal area is limited and cannot record everything, the problematic transaction may also be covered even if we do this, but this is still useful for fuzzy tests and short-running products. This patch save the head blocknr in the superblock after flushing the journal or unmounting the file system, let the next mount could continue to record new transaction behind it. This change is backward compatible because the old kernel does not care about the head blocknr of the journal. It is also fine if we mount a clean old image without valid head blocknr, we fail back to set it to s_first just like before. Finally, for the case of mount an unclean file system, we could also get the journal head easily after scanning/replaying the journal, it will continue to record new transaction after the recovered transactions. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
04c2e981 |
|
14-Mar-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: remove j_format_version journal->j_format_version is no longer used, remove it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-7-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
3e5cf02c |
|
14-Mar-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: factor out journal initialization from journal_get_superblock() Current journal_get_superblock() couple journal superblock checking and partial journal initialization, factor out initialization part from it to make things clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-6-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
5cf036d4 |
|
14-Mar-2023 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: switch to check format version in superblock directly We should only check and set extented features if journal format version is 2, and now we check the in memory copy of the superblock 'journal->j_format_version', which relys on the parameter initialization sequence, switch to use the h_blocktype in superblock cloud be more clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
7afb6d8f |
|
05-Jun-2023 |
Andy Shevchenko <andriy.shevchenko@linux.intel.com> |
jbd2: Avoid printing outside the boundary of the buffer Theoretically possible that "%pg" will take all room for the j_devname and hence the "-%lu" will go outside the boundary due to unconditional sprintf() in use. To make this code more robust, replace two sequential s*printf():s by a single call and then replace forbidden character. It's possible to do this way, because '/' won't ever be in the result of "-%lu". Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230605170553.7835-2-andriy.shevchenko@linux.intel.com
|
#
62913ae9 |
|
07-Mar-2023 |
Theodore Ts'o <tytso@mit.edu> |
ext4, jbd2: add an optimized bmap for the journal inode The generic bmap() function exported by the VFS takes locks and does checks that are not necessary for the journal inode. So allow the file system to set a journal-optimized bmap function in journal->j_bmap. Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
cff61bbc |
|
29-Dec-2022 |
Christoph Hellwig <hch@lst.de> |
jbd2,ocfs2: move jbd2_journal_submit_inode_data_buffers to ocfs2 jbd2_journal_submit_inode_data_buffers is only used by ocfs2, so move it there to prepare for removing generic_writepages. Link: https://lkml.kernel.org/r/20221229161031.391878-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
0d22fe2f |
|
15-Dec-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
jbd2: replace obvious uses of b_page with b_folio These places just use b_page to get to the buffer's address_space or have already been converted to folio. Link: https://lkml.kernel.org/r/20221215214402.3522366-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
8c004d1f |
|
01-Sep-2022 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: replace ll_rw_block() ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in journal_get_superblock(). We also switch to new bh_readahead_batch() for the buffer array readahead path. Link: https://lkml.kernel.org/r/20220901133505.2510834-7-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
243d1a5d |
|
14-Sep-2022 |
Ye Bin <yebin10@huawei.com> |
jbd2: fix potential use-after-free in jbd2_fc_wait_bufs In 'jbd2_fc_wait_bufs' use 'bh' after put buffer head reference count which may lead to use-after-free. So judge buffer if uptodate before put buffer head reference count. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100812.1414768-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
e0d5fc7a |
|
14-Sep-2022 |
Ye Bin <yebin10@huawei.com> |
jbd2: fix potential buffer head reference count leak As in 'jbd2_fc_wait_bufs' if buffer isn't uptodate, will return -EIO without update 'journal->j_fc_off'. But 'jbd2_fc_release_bufs' will release buffer head from ‘j_fc_off - 1’ if 'bh' is NULL will terminal release which will lead to buffer head buffer head reference count leak. To solve above issue, update 'journal->j_fc_off' before return -EIO. Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220914100812.1414768-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
f3ed5df3 |
|
17-Aug-2022 |
Ritesh Harjani (IBM) <ritesh.list@gmail.com> |
jbd2: drop useless return value of submit_bh submit_bh always returns 0. This patch cleans up 2 of it's caller in jbd2 to drop submit_bh's useless return value. Once all submit_bh callers are cleaned up, we can make it's return type as void. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/r/e069c0539be0aec61abcdc6f6141982ec85d489d.1660788334.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
e33c267a |
|
31-May-2022 |
Roman Gushchin <roman.gushchin@linux.dev> |
mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: <subsystem>-<shrinker_type>[:<instance>]-<id> For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot <lkp@intel.com> Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Dave Chinner <dchinner@redhat.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
d1324958 |
|
08-Jun-2022 |
Jan Kara <jack@suse.cz> |
jbd2: unexport jbd2_log_start_commit() jbd2_log_start_commit() is not used outside of jbd2 so unexport it. Also make __jbd2_log_start_commit() static when we are at it. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
68af74e9 |
|
08-Jun-2022 |
Jan Kara <jack@suse.cz> |
jbd2: remove unused exports for jbd2 debugging Jbd2 exports jbd2_journal_enable_debug and __jbd2_debug() depite the first is used only in fs/jbd2/journal.c and the second only within jbd2 code. Remove the pointless exports make jbd2_journal_enable_debug static. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
cb3b3bf2 |
|
08-Jun-2022 |
Jan Kara <jack@suse.cz> |
jbd2: rename jbd_debug() to jbd2_debug() The name of jbd_debug() is confusing as all functions inside jbd2 have jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6669797b |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
fs/jbd2: Fix the documentation of the jbd2_write_superblock() callers Commit 2a222ca992c3 ("fs: have submit_bh users pass in op and flags separately") renamed the jbd2_write_superblock() 'write_op' argument into 'write_flags'. Propagate this change to the jbd2_write_superblock() callers. Additionally, change the type of 'write_flags' into blk_opf_t. Cc: Mike Christie <michael.christie@oracle.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-57-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
1420c4a5 |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
fs/buffer: Combine two submit_bh() and ll_rw_block() arguments Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Acked-by: Song Liu <song@kernel.org> (for the md changes) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
900d156b |
|
12-Jul-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove bdevname Replace the remaining calls of bdevname with snprintf using the %pg format specifier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
44abff2c |
|
14-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
70200574 |
|
14-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove QUEUE_FLAG_DISCARD Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
ccd16945 |
|
09-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ext4: Convert invalidatepage to invalidate_folio Extensive changes, but fairly mechanical. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
|
#
715a67f1 |
|
10-Jan-2022 |
Yang Li <yang.lee@linux.alibaba.com> |
jbd2: fix kernel-doc descriptions for jbd2_journal_shrink_{scan,count}() Add the description of @shrink and @sc in jbd2_journal_shrink_scan() and jbd2_journal_shrink_count() kernel-doc comment to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/jbd2/journal.c:1296: warning: Function parameter or member 'shrink' not described in 'jbd2_journal_shrink_scan' fs/jbd2/journal.c:1296: warning: Function parameter or member 'sc' not described in 'jbd2_journal_shrink_scan' fs/jbd2/journal.c:1320: warning: Function parameter or member 'shrink' not described in 'jbd2_journal_shrink_count' fs/jbd2/journal.c:1320: warning: Function parameter or member 'sc' not described in 'jbd2_journal_shrink_count' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220110132841.34531-1-yang.lee@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
e85c81ba |
|
17-Jan-2022 |
Xin Yin <yinxin.x@bytedance.com> |
ext4: fast commit may not fallback for ineligible commit For the follow scenario: 1. jbd start commit transaction n 2. task A get new handle for transaction n+1 3. task A do some ineligible actions and mark FC_INELIGIBLE 4. jbd complete transaction n and clean FC_INELIGIBLE 5. task A call fsync In this case fast commit will not fallback to full commit and transaction n+1 also not handled by jbd. Make ext4_fc_mark_ineligible() also record transaction tid for latest ineligible case, when call ext4_fc_cleanup() check current transaction tid, if small than latest ineligible tid do not clear the EXT4_MF_FC_INELIGIBLE. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Suggested-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Signed-off-by: Xin Yin <yinxin.x@bytedance.com> Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
|
#
4cd1103d |
|
29-Jan-2022 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
jbd2: export jbd2_journal_[grab|put]_journal_head Patch series "ocfs2: fix a deadlock case". This fixes a deadlock case in ocfs2. We firstly export jbd2 symbols jbd2_journal_[grab|put]_journal_head as preparation and later use them in ocfs2 insread of jbd_[lock|unlock]_bh_journal_head to fix the deadlock. This patch (of 2): This exports symbols jbd2_journal_[grab|put]_journal_head, which will be used outside modules, e.g. ocfs2. Link: https://lkml.kernel.org/r/20220121071205.100648-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com> Cc: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
359745d7 |
|
21-Jan-2022 |
Muchun Song <songmuchun@bytedance.com> |
proc: remove PDE_DATA() completely Remove PDE_DATA() completely and replace it with pde_data(). [akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c] [akpm@linux-foundation.org: now fix it properly] Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2729cfdc |
|
23-Dec-2021 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4: use ext4_journal_start/stop for fast commit transactions This patch drops all calls to ext4_fc_start_update() and ext4_fc_stop_update(). To ensure that there are no ongoing journal updates during fast commit, we also make jbd2_fc_begin_commit() lock journal for updates. This way we don't have to maintain two different transaction start stop APIs for fast commit and full commit. This patch doesn't remove the functions altogether since in future we want to have inode level locking for fast commits. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-2-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0705e8d1 |
|
02-Jul-2021 |
Theodore Ts'o <tytso@mit.edu> |
ext4: inline jbd2_journal_[un]register_shrinker() The function jbd2_journal_unregister_shrinker() was getting called twice when the file system was getting unmounted. On Power and ARM platforms this was causing kernel crash when unmounting the file system, when a percpu_counter was destroyed twice. Fix this by removing jbd2_journal_[un]register_shrinker() functions, and inlining the shrinker setup and teardown into journal_init_common() and jbd2_journal_destroy(). This means that ext4 and ocfs2 now no longer need to know about registering and unregistering jbd2's shrinker. Also, while we're at it, rename the percpu counter from j_jh_shrink_count to j_checkpoint_jh_count, since this makes it clearer what this counter is intended to track. Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
16aa4c9a |
|
30-Jun-2021 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: export jbd2_journal_[un]register_shrinker() Export jbd2_journal_[un]register_shrinker() to fix this error when ext4 is built as a module: ERROR: modpost: "jbd2_journal_unregister_shrinker" undefined! ERROR: modpost: "jbd2_journal_register_shrinker" undefined! Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210630083638.140218-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
4ba3fcdd |
|
10-Jun-2021 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2,ext4: add a shrinker to release checkpointed buffers Current metadata buffer release logic in bdev_try_to_free_page() have a lot of use-after-free issues when umount filesystem concurrently, and it is difficult to fix directly because ext4 is the only user of s_op->bdev_try_to_free_page callback and we may have to add more special refcount or lock that is only used by ext4 into the common vfs layer, which is unacceptable. One better solution is remove the bdev_try_to_free_page callback, but the real problem is we cannot easily release journal_head on the checkpointed buffer, so try_to_free_buffers() cannot release buffers and page under memory pressure, which is more likely to trigger out-of-memory. So we cannot remove the callback directly before we find another way to release journal_head. This patch introduce a shrinker to free journal_head on the checkpointed transaction. After the journal_head got freed, try_to_free_buffers() could free buffer properly. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
fcf37549 |
|
10-Jun-2021 |
Zhang Yi <yi.zhang@huawei.com> |
jbd2: ensure abort the journal if detect IO error when writing original buffer back Although we merged c044f3d8360 ("jbd2: abort journal if free a async write error metadata buffer"), there is a race between jbd2_journal_try_to_free_buffers() and jbd2_journal_destroy(), so the jbd2_log_do_checkpoint() may still fail to detect the buffer write io error flag which may lead to filesystem inconsistency. jbd2_journal_try_to_free_buffers() ext4_put_super() jbd2_journal_destroy() __jbd2_journal_remove_checkpoint() detect buffer write error jbd2_log_do_checkpoint() jbd2_cleanup_journal_tail() <--- lead to inconsistency jbd2_journal_abort() Fix this issue by introducing a new atomic flag which only have one JBD2_CHECKPOINT_IO_ERROR bit now, and set it in __jbd2_journal_remove_checkpoint() when freeing a checkpoint buffer which has write_io_error flag. Then jbd2_journal_destroy() will detect this mark and abort the journal to prevent updating log tail. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210610112440.3438139-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d07621d9 |
|
08-Jun-2021 |
yangerkun <yangerkun@huawei.com> |
jbd2: clean up misleading comments for jbd2_fc_release_bufs This comments was for jbd2_fc_wait_bufs, not for jbd2_fc_release_bufs. Remove this misleading comments. Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210608141236.459441-1-yangerkun@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
01d5d965 |
|
18-May-2021 |
Leah Rumancik <leah.rumancik@gmail.com> |
ext4: add discard/zeroout flags to journal flush Add a flags argument to jbd2_journal_flush to enable discarding or zero-filling the journal blocks while flushing the journal. Signed-off-by: Leah Rumancik <leah.rumancik@gmail.com> Link: https://lore.kernel.org/r/20210518151327.130198-1-leah.rumancik@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
9bd23c31 |
|
20-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: add a helper to find out number of fast commit blocks Add a helper to read number of fast commit blocks from jbd2 superblock and also rename the JBD2_MIN_FC_BLKS to JBD2_DEFAULT_FAST_COMMIT_BLOCKS since this constant is just the default number of fast commit blocks to use in case number of fast commit blocks isn't set in jbd2 superblock. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201120202232.2240293-2-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
2bf31d94 |
|
16-Nov-2020 |
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> |
jbd2: fix kernel-doc markups Kernel-doc markup should use this format: identifier - description They should not have any type before that, as otherwise the parser won't do the right thing. Also, some identifiers have different names between their prototypes and the kernel-doc markup. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/72f5c6628f5f278d67625f60893ffbc2ca28d46e.1605521731.git.mchehab+huawei@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
87a144f0 |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: don't start fast commit on aborted journal Fast commit should not be started if the journal is aborted. Signed-off-by: Harshad Shirwadkar<harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-22-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
480f89d5 |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: don't read journal->j_commit_sequence without taking a lock Take journal state lock before reading journal->j_commit_sequence. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-13-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0ee66ddc |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: don't touch buffer state until it is filled Fast commit buffers should be filled in before toucing their state. Remove code that sets buffer state as dirty before the buffer is passed to the file system. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-12-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0bce577b |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: don't pass tid to jbd2_fc_end_commit_fallback() In jbd2_fc_end_commit_fallback(), we know which tid to commit. There's no need for caller to pass it. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-10-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
c460e5ed |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: don't use state lock during commit path Variables journal->j_fc_off, journal->j_fc_wbuf are accessed during commit path. Since today we allow only one process to perform a fast commit, there is no need take state lock before accessing these variables. This patch removes these locks and adds comments to describe this. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-9-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
a1e5e465 |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4: clean up the JBD2 API that initializes fast commits This patch removes jbd2_fc_init() API and its related functions to simplify enabling fast commits. With this change, the number of fast commit blocks to use is solely determined by the JBD2 layer. So, we move the default value for minimum number of fast commit blocks from ext4/fast_commit.h to include/linux/jbd2.h. However, whether or not to use fast commits is determined by the file system. The file system just sets the fast commit feature using jbd2_journal_set_features(). JBD2 layer then determines how many blocks to use for fast commits (based on the value found in the JBD2 superblock). Note that the JBD2 feature flag of fast commits is just an indication that there are fast commit blocks present on disk. It doesn't tell JBD2 layer about the intent of the file system of whether to it wants to use fast commit or not. That's why, we blindly clear the fast commit flag in journal_reset() after the recovery is done. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-7-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ede7dc7f |
|
05-Nov-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: rename j_maxlen to j_total_len and add jbd2_journal_max_txn_bufs The on-disk superblock field sb->s_maxlen represents the total size of the journal including the fast commit area and is no more the max number of blocks available for a transaction. The maximum number of blocks available to a transaction is reduced by the number of fast commit blocks. So, this patch renames j_maxlen to j_total_len to better represent its intent. Also, it adds a function to calculate max number of bufs available for a transaction. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201106035911.1942128-6-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ff780b91 |
|
15-Oct-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
jbd2: add fast commit machinery This functions adds necessary APIs needed in JBD2 layer for fast commits. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6866d7b3 |
|
15-Oct-2020 |
Harshad Shirwadkar <harshadshirwadkar@gmail.com> |
ext4 / jbd2: add fast commit initialization This patch adds fast commit area trackers in the journal_t structure. These are initialized via the jbd2_fc_init() routine that this patch adds. This patch also adds ext4/fast_commit.c and ext4/fast_commit.h files for fast commit code that will be added in subsequent patches in this series. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
aa3c0c61 |
|
05-Oct-2020 |
Mauricio Faria de Oliveira <mfo@canonical.com> |
jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers() Export functions that implement the current behavior done for an inode in journal_submit|finish_inode_data_buffers(). No functional change. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-2-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
60ed633f |
|
18-Jul-2020 |
Xianting Tian <xianting_tian@126.com> |
jbd2: fix incorrect code style Remove unnecessary blank. Signed-off-by: Xianting Tian <xianting_tian@126.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1595077057-8048-1-git-send-email-xianting_tian@126.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ef3f5830 |
|
20-Jun-2020 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: add the missing unlock_buffer() in the error path of jbd2_write_superblock() jbd2_write_superblock() is under the buffer lock of journal superblock before ending that superblock write, so add a missing unlock_buffer() in in the error path before submitting buffer. Fixes: 742b06b5628f ("jbd2: check superblock mapped prior to committing") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20200620061948.2049579-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
7b97d868 |
|
09-Jun-2020 |
zhangyi (F) <yi.zhang@huawei.com> |
ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers In the ext4 filesystem with errors=panic, if one process is recording errno in the superblock when invoking jbd2_journal_abort() due to some error cases, it could be raced by another __ext4_abort() which is setting the SB_RDONLY flag but missing panic because errno has not been recorded. jbd2_journal_commit_transaction() jbd2_journal_abort() journal->j_flags |= JBD2_ABORT; jbd2_journal_update_sb_errno() | ext4_journal_check_start() | __ext4_abort() | sb->s_flags |= SB_RDONLY; | if (!JBD2_REC_ERR) | return; journal->j_flags |= JBD2_REC_ERR; Finally, it will no longer trigger panic because the filesystem has already been set read-only. Fix this by introduce j_abort_mutex to make sure journal abort is completed before panic, and remove JBD2_REC_ERR flag. Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
97a32539 |
|
03-Feb-2020 |
Alexey Dobriyan <adobriyan@gmail.com> |
proc: convert everything to "struct proc_ops" The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in seq_file.h. Conversion rule is: llseek => proc_lseek unlocked_ioctl => proc_ioctl xxx => proc_xxx delete ".owner = THIS_MODULE" line [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c] [sfr@canb.auug.org.au: fix kernel/sched/psi.c] Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
30460e1e |
|
09-Jan-2020 |
Carlos Maiolino <cmaiolino@redhat.com> |
fs: Enable bmap() function to properly return errors By now, bmap() will either return the physical block number related to the requested file offset or 0 in case of error or the requested offset maps into a hole. This patch makes the needed changes to enable bmap() to proper return errors, using the return value as an error return, and now, a pointer must be passed to bmap() to be filled with the mapped physical block. It will change the behavior of bmap() on return: - negative value in case of error - zero on success or map fell into a hole In case of a hole, the *block will be zero too Since this is a prep patch, by now, the only error return is -EINVAL if ->bmap doesn't exist. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7f6225e4 |
|
04-Dec-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: clean __jbd2_journal_abort_hard() and __journal_abort_soft() __jbd2_journal_abort_hard() is no longer used, so now we can merge __jbd2_journal_abort_hard() and __journal_abort_soft() these two functions into jbd2_journal_abort() and remove them. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
0e98c084 |
|
04-Dec-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
jbd2: make sure ESHUTDOWN to be recorded in the journal superblock Commit fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") want to allow jbd2 layer to distinguish shutdown journal abort from other error cases. So the ESHUTDOWN should be taken precedence over any other errno which has already been recoded after EXT4_FLAGS_SHUTDOWN is set, but it only update errno in the journal suoerblock now if the old errno is 0. Fixes: fb7c02445c49 ("ext4: pass -ESHUTDOWN code to jbd2 layer") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
51f57b01 |
|
04-Dec-2019 |
zhangyi (F) <yi.zhang@huawei.com> |
ext4, jbd2: ensure panic when aborting with zero errno JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2 aborted, and then __ext4_abort() and ext4_handle_error() can invoke panic if ERRORS_PANIC is specified. But if the journal has been aborted with zero errno, jbd2_journal_abort() didn't set this flag so we can no longer panic. Fix this by always record the proper errno in the journal superblock. Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock") Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191204124614.45424-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
1a8e9cf4 |
|
22-Jan-2020 |
Vasily Averin <vvs@virtuozzo.com> |
jbd2_seq_info_next should increase position index if seq_file .next fuction does not change position index, read after some lseek can generate unexpected output. Script below generates endless output $ q=;while read -r r;do echo "$((++q)) $r";done </proc/fs/jbd2/DEV/info https://bugzilla.kernel.org/show_bug.cgi?id=206283 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") Cc: stable@kernel.org Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/d13805e5-695e-8ac3-b678-26ca2313629f@virtuozzo.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
17c51d83 |
|
23-Jan-2020 |
Shijie Luo <luoshijie1@huawei.com> |
jbd2: remove pointless assertion in __journal_remove_journal_head Only when jh->b_jcount = 0 in jbd2_journal_put_journal_head, we are allowed to call __journal_remove_journal_head. This assertion is meaningless, just remove it. Signed-off-by: Shijie Luo <luoshijie1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200123070054.50585-1-luoshijie1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
a09decff |
|
10-Jan-2020 |
Kai Li <li.kai4@h3c.com> |
jbd2: clear JBD2_ABORT flag before journal_reset to update log tail info when load journal If the journal is dirty when the filesystem is mounted, jbd2 will replay the journal but the journal superblock will not be updated by journal_reset() because JBD2_ABORT flag is still set (it was set in journal_init_common()). This is problematic because when a new transaction is then committed, it will be recorded in block 1 (journal->j_tail was set to 1 in journal_reset()). If unclean shutdown happens again before the journal superblock is updated, the new recorded transaction will not be replayed during the next mount (because of stale sb->s_start and sb->s_sequence values) which can lead to filesystem corruption. Fixes: 85e0c4e89c1b ("jbd2: if the journal is aborted then don't allow update of the log tail") Signed-off-by: Kai Li <li.kai4@h3c.com> Link: https://lore.kernel.org/r/20200111022542.5008-1-li.kai4@h3c.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
fdc3ef88 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Reserve space for revoke descriptor blocks Extend functions for starting, extending, and restarting transaction handles to take number of revoke records handle must be able to accommodate. These functions then make sure transaction has enough credits to be able to store resulting revoke descriptor blocks. Also revoke code tracks number of revoke records created by a handle to catch situation where some place didn't reserve enough space for revoke records. Similarly to standard transaction credits, space for unused reserved revoke records is released when the handle is stopped. On the ext4 side we currently take a simplistic approach of reserving space for 1024 revoke records for any transaction. This grows amount of credits reserved for each handle only by a few and is enough for any normal workload so that we don't hit warnings in jbd2. We will refine the logic in following commits. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
9f356e5a |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Account descriptor blocks into t_outstanding_credits Currently, journal descriptor blocks were not accounted in transaction->t_outstanding_credits and we were just leaving some slack space in the journal for them (in jbd2_log_space_left() and jbd2_space_needed()). This is making proper accounting (and reservation we want to add) of descriptor blocks difficult so switch to accounting descriptor blocks in transaction->t_outstanding_credits and just reserve the same amount of credits in t_outstanding credits for journal descriptor blocks when creating transaction. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
b90bfdf5 |
|
05-Nov-2019 |
Jan Kara <jack@suse.cz> |
jbd2: Completely fill journal descriptor blocks With 32-bit block numbers, we don't allocate the array for journal buffer heads large enough for corresponding descriptor tags to fill the descriptor block. Thus we end up writing out half-full descriptor blocks to the journal unnecessarily growing the transaction. Fix the logic to allocate the array large enough. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
7855a57d |
|
09-Aug-2019 |
Thomas Gleixner <tglx@linutronix.de> |
jbd2: Free journal head outside of locked region On PREEMPT_RT bit-spinlocks have the same semantics as on PREEMPT_RT=n, i.e. they disable preemption. That means functions which are not safe to be called in preempt disabled context on RT trigger a might_sleep() assert. The journal head bit spinlock is mostly held for short code sequences with trivial RT safe functionality, except for one place: jbd2_journal_put_journal_head() invokes __journal_remove_journal_head() with the journal head bit spinlock held. __journal_remove_journal_head() invokes kmem_cache_free() which must not be called with preemption disabled on RT. Jan suggested to rework the removal function so the actual free happens outside the bit-spinlocked region. Split it into two parts: - Do the sanity checks and the buffer head detach under the lock - Do the actual free after dropping the lock There is error case handling in the free part which needs to dereference the b_size field of the now detached buffer head. Due to paranoia (caused by ignorance) the size is retrieved in the detach function and handed into the free function. Might be over-engineered, but better safe than sorry. This makes the journal head bit-spinlock usage RT compliant and also avoids nested locking which is not covered by lockdep. Suggested-by: Jan Kara <jack@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-ext4@vger.kernel.org Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20190809124233.13277-8-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
46417064 |
|
09-Aug-2019 |
Thomas Gleixner <tglx@linutronix.de> |
jbd2: Make state lock a spinlock Bit-spinlocks are problematic on PREEMPT_RT if functions which might sleep on RT, e.g. spin_lock(), alloc/free(), are invoked inside the lock held region because bit spinlocks disable preemption even on RT. A first attempt was to replace state lock with a spinlock placed in struct buffer_head and make the locking conditional on PREEMPT_RT and DEBUG_BIT_SPINLOCKS. Jan pointed out that there is a 4 byte hole in struct journal_head where a regular spinlock fits in and he would not object to convert the state lock to a spinlock unconditionally. Aside of solving the RT problem, this also gains lockdep coverage for the journal head state lock (bit-spinlocks are not covered by lockdep as it's hard to fit a lockdep map into a single bit). The trivial change would have been to convert the jbd_*lock_bh_state() inlines, but that comes with the downside that these functions take a buffer head pointer which needs to be converted to a journal head pointer which adds another level of indirection. As almost all functions which use this lock have a journal head pointer readily available, it makes more sense to remove the lock helper inlines and write out spin_*lock() at all call sites. Fixup all locking comments as well. Suggested-by: Jan Kara <jack@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Link: https://lore.kernel.org/r/20190809124233.13277-7-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
963abb9a |
|
23-Sep-2019 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
jbd2: remove jbd2_journal_inode_add_[write|wait] Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now, jbd2_journal_inode_add_[write|wait] are not used any more, remove them. Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Acked-by: Changwei Ge <chge@linux.alibaba.com> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
9382cde8 |
|
20-Jun-2019 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: drop declaration of journal_sync_buffer() The journal_sync_buffer() function was never carried over from jbd to jbd2. So get rid of the vestigal declaration of this (non-existent) function. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
6ba0e7dc |
|
20-Jun-2019 |
Ross Zwisler <zwisler@chromium.org> |
jbd2: introduce jbd2_inode dirty range scoping Currently both journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() operate on the entire address space of each of the inodes associated with a given journal entry. The consequence of this is that if we have an inode where we are constantly appending dirty pages we can end up waiting for an indefinite amount of time in journal_finish_inode_data_buffers() while we wait for all the pages under writeback to be written out. The easiest way to cause this type of workload is do just dd from /dev/zero to a file until it fills the entire filesystem. This can cause journal_finish_inode_data_buffers() to wait for the duration of the entire dd operation. We can improve this situation by scoping each of the inode dirty ranges associated with a given transaction. We do this via the jbd2_inode structure so that the scoping is contained within jbd2 and so that it follows the lifetime and locking rules for that structure. This allows us to limit the writeback & wait in journal_submit_inode_data_buffers() and journal_finish_inode_data_buffers() respectively to the dirty range for a given struct jdb2_inode, keeping us from waiting forever if the inode in question is still being appended to. Signed-off-by: Ross Zwisler <zwisler@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
|
#
7821ce41 |
|
30-May-2019 |
Gaowei Pu <pugaowei@gmail.com> |
jbd2: fix some print format mistakes There are some print format mistakes in debug messages. Fix them. Signed-off-by: Gaowei Pu <pugaowei@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
0d52154b |
|
10-May-2019 |
Chengguang Xu <cgxu519@gmail.com> |
jbd2: fix potential double free When failing from creating cache jbd2_inode_cache, we will destroy the previously created cache jbd2_handle_cache twice. This patch fixes this by moving each cache initialization/destruction to its own separate, individual function. Signed-off-by: Chengguang Xu <cgxu519@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
|
#
742b06b5 |
|
06-Apr-2019 |
Jiufei Xue <jiufei.xue@linux.alibaba.com> |
jbd2: check superblock mapped prior to committing We hit a BUG at fs/buffer.c:3057 if we detached the nbd device before unmounting ext4 filesystem. The typical chain of events leading to the BUG: jbd2_write_superblock submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh)); The block device is removed and all the pages are invalidated. JBD2 was trying to write journal superblock to the block device which is no longer present. Fix this by checking the journal superblock's buffer head prior to submitting. Reported-by: Eric Ren <renzhen@linux.alibaba.com> Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
|
#
a58ca992 |
|
14-Feb-2019 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: fold jbd2_superblock_csum_{verify,set} into their callers The functions jbd2_superblock_csum_verify() and jbd2_superblock_csum_set() only get called from one location, so to simplify things, fold them into their callers. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
538bcaa6 |
|
14-Feb-2019 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: fix race when writing superblock The jbd2 superblock is lockless now, so there is probably a race condition between writing it so disk and modifing contents of it, which may lead to checksum error. The following race is the one case that we have captured. jbd2 fsstress jbd2_journal_commit_transaction jbd2_journal_update_sb_log_tail jbd2_write_superblock jbd2_superblock_csum_set jbd2_journal_revoke jbd2_journal_set_features(revork) modify superblock submit_bh(checksum incorrect) Fix this by locking the buffer head before modifing it. We always write the jbd2 superblock after we modify it, so this just means calling the lock_buffer() a little earlier. This checksum corruption problem can be reproduced by xfstests generic/475. Reported-by: zhangyi (F) <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
53cf9784 |
|
31-Jan-2019 |
Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> |
jbd2: fix deadlock while checkpoint thread waits commit thread to finish This issue was found when I tried to put checkpoint work in a separate thread, the deadlock below happened: Thread1 | Thread2 __jbd2_log_wait_for_space | jbd2_log_do_checkpoint (hold j_checkpoint_mutex)| if (jh->b_transaction != NULL) | ... | jbd2_log_start_commit(journal, tid); |jbd2_update_log_tail | will lock j_checkpoint_mutex, | but will be blocked here. | jbd2_log_wait_commit(journal, tid); | wait_event(journal->j_wait_done_commit, | !tid_gt(tid, journal->j_commit_sequence)); | ... |wake_up(j_wait_done_commit) } | then deadlock occurs, Thread1 will never be waken up. To fix this issue, drop j_checkpoint_mutex in jbd2_log_do_checkpoint() when we are going to wait for transaction commit. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
8bdd5b60 |
|
20-May-2018 |
Wang Long <wanglong19@meituan.com> |
jbd2: remove NULL check before calling kmem_cache_destroy() The kmem_cache_destroy() function already checks for null pointers, so we can remove the check at the call site. This patch also sets jbd2_handle_cache and jbd2_inode_cache to be NULL after freeing them in jbd2_journal_destroy_handle_cache(). Signed-off-by: Wang Long <wanglong19@meituan.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
9196f571 |
|
20-May-2018 |
Wang Shilong <wshilong@ddn.com> |
jbd2: remove bunch of empty lines with jbd2 debug See following dmesg output with jbd2 debug enabled: ...(start_this_handle, 313): New handle 00000000c88d6ceb going live. ...(start_this_handle, 383): Handle 00000000c88d6ceb given 53 credits (total 53, free 32681) ...(do_get_write_access, 838): journal_head 0000000002856fc0, force_copy 0 ...(jbd2_journal_cancel_revoke, 421): journal_head 0000000002856fc0, cancelling revoke We have an extra line with every messages, this is a waste of buffer, we can fix it by removing "\n" in the caller or remove it in the __jbd2_debug(), i checked every jbd2_debug() passed '\n' explicitly. To avoid more lines, let's remove it inside __jbd2_debug(). Signed-off-by: Wang Shilong <wshilong@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
85e0c4e8 |
|
18-Feb-2018 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: if the journal is aborted then don't allow update of the log tail This updates the jbd2 superblock unnecessarily, and on an abort we shouldn't truncate the log. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
fb7c0244 |
|
18-Feb-2018 |
Theodore Ts'o <tytso@mit.edu> |
ext4: pass -ESHUTDOWN code to jbd2 layer Previously the jbd2 layer assumed that a file system check would be required after a journal abort. In the case of the deliberate file system shutdown, this should not be necessary. Allow the jbd2 layer to distinguish between these two cases by using the ESHUTDOWN errno. Also add proper locking to __journal_abort_soft(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
f5166768 |
|
17-Dec-2017 |
Theodore Ts'o <tytso@mit.edu> |
ext4: fix up remaining files with SPDX cleanups A number of ext4 source files were skipped due because their copyright permission statements didn't match the expected text used by the automated conversion utilities. I've added SPDX tags for the rest. While looking at some of these files, I've noticed that we have quite a bit of variation on the licenses that were used --- in particular some of the Red Hat licenses on the jbd2 files use a GPL2+ license, and we have some files that have a LGPL-2.1 license (which was quite surprising). I've not attempted to do any license changes. Even if it is perfectly legal to relicense to GPL 2.0-only for consistency's sake, that should be done with ext4 developer community discussion. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
b8a6176c |
|
01-Nov-2017 |
Jan Kara <jack@suse.cz> |
ext4: Support for synchronous DAX faults We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to prepare blocks for writing and the inode has some uncommitted metadata changes. In the fault handler ext4_dax_fault() we then detect this case (through VM_FAULT_NEEDDSYNC return value) and call helper dax_finish_sync_fault() to flush metadata changes and insert page table entry. Note that this will also dirty corresponding radix tree entry which is what we want - fsync(2) will still provide data integrity guarantees for applications not using userspace flushing. And applications using userspace flushing can avoid calling fsync(2) and thus avoid the performance overhead. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e3c95788 |
|
17-Oct-2017 |
Kees Cook <keescook@chromium.org> |
jbd2: convert timers to use timer_setup() In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.com> Cc: linux-ext4@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de>
|
#
21417136 |
|
05-Mar-2017 |
Ingo Molnar <mingo@kernel.org> |
sched/wait: Standardize 'struct wait_bit_queue' wait-queue entry field name Rename 'struct wait_bit_queue::wait' to ::wq_entry, to more clearly name it as a wait-queue entry. Propagate it to a couple of usage sites where the wait-bit-queue internals are exposed. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
17f423b5 |
|
04-May-2017 |
Jan Kara <jack@suse.cz> |
jbd2: cleanup write flags handling from jbd2_write_superblock() Currently jbd2_write_superblock() silently adds REQ_SYNC to flags with which journal superblock is written. Make this explicit by making flags passed down to jbd2_write_superblock() contain REQ_SYNC. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
eb52da3f |
|
03-May-2017 |
Michal Hocko <mhocko@suse.com> |
jbd2: make the whole kjournald2 kthread NOFS safe kjournald2 is central to the transaction commit processing. As such any potential allocation from this kernel thread has to be GFP_NOFS. Make sure to mark the whole kernel thread GFP_NOFS by the memalloc_nofs_save. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170306131408.9828-8-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <david@fromorbit.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Chris Mason <clm@fb.com> Cc: David Sterba <dsterba@suse.cz> Cc: Brian Foster <bfoster@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Nikolay Borisov <nborisov@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5052b069 |
|
29-Apr-2017 |
Jan Kara <jack@suse.cz> |
jbd2: fix dbench4 performance regression for 'nobarrier' mounts Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the filesystem is mounted with nobarrier mount option, journal superblock writes ended up being async writes after this patch and that caused heavy performance regression for dbench4 benchmark with high number of processes. In my test setup with HP RAID array with non-volatile write cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%. Fix the problem by making sure journal superblock writes are always treated as synchronous since they generally block progress of the journalling machinery and thus the whole filesystem. Fixes: b685d3d65ac791406e0dfd8779cc9b3707fea5a3 CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
c52c47e4 |
|
29-Apr-2017 |
Jan Kara <jack@suse.cz> |
jbd2: Fix lockdep splat with generic/270 test I've hit a lockdep splat with generic/270 test complaining that: 3216.fsstress.b/3533 is trying to acquire lock: (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150 but task is already holding lock: (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850 The underlying problem is that jbd2_journal_force_commit_nested() (called from ext4_should_retry_alloc()) may get called while a transaction handle is started. In such case it takes care to not wait for commit of the running transaction (which would deadlock) but only for a commit of a transaction that is already committing (which is safe as that doesn't wait for any filesystem locks). In fact there are also other callers of jbd2_log_wait_commit() that take care to pass tid of a transaction that is already committing and for those cases, the lockdep instrumentation is too restrictive and leading to false positive reports. Fix the problem by calling jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the transaction isn't already committing. Fixes: 1eaa566d368b214d99cbb973647c1b0b8102a9ae Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
5f0d5a3a |
|
18-Jan-2017 |
Paul E. McKenney <paulmck@kernel.org> |
mm: Rename SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU A group of Linux kernel hackers reported chasing a bug that resulted from their assumption that SLAB_DESTROY_BY_RCU provided an existence guarantee, that is, that no block from such a slab would be reallocated during an RCU read-side critical section. Of course, that is not the case. Instead, SLAB_DESTROY_BY_RCU only prevents freeing of an entire slab of blocks. However, there is a phrase for this, namely "type safety". This commit therefore renames SLAB_DESTROY_BY_RCU to SLAB_TYPESAFE_BY_RCU in order to avoid future instances of this sort of confusion. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> [ paulmck: Add comments mentioning the old name, as requested by Eric Dumazet, in order to help people familiar with the old name find the new one. ] Acked-by: David Rientjes <rientjes@google.com>
|
#
cd9cb405 |
|
15-Mar-2017 |
Eric Biggers <ebiggers@google.com> |
jbd2: don't leak memory if setting up journal fails In journal_init_common(), if we failed to allocate the j_wbuf array, or if we failed to create the buffer_head for the journal superblock, we leaked the memory allocated for the revocation tables. Fix this. Cc: stable@vger.kernel.org # 4.9 Fixes: f0c9fd5458bacf7b12a9a579a727dc740cbe047e Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
dbfcef6b |
|
01-Feb-2017 |
Sahitya Tummala <stummala@codeaurora.org> |
jbd2: fix use after free in kjournald2() Below is the synchronization issue between unmount and kjournald2 contexts, which results into use after free issue in kjournald2(). Fix this issue by using journal->j_state_lock to synchronize the wait_event() done in journal_kill_thread() and the wake_up() done in kjournald2(). TASK 1: umount cmd: |--jbd2_journal_destroy() { |--journal_kill_thread() { write_lock(&journal->j_state_lock); journal->j_flags |= JBD2_UNMOUNT; ... write_unlock(&journal->j_state_lock); wake_up(&journal->j_wait_commit); TASK 2 wakes up here: kjournald2() { ... checks JBD2_UNMOUNT flag and calls goto end-loop; ... end_loop: write_unlock(&journal->j_state_lock); journal->j_task = NULL; --> If this thread gets pre-empted here, then TASK 1 wait_event will exit even before this thread is completely done. wait_event(journal->j_wait_done_commit, journal->j_task == NULL); ... write_lock(&journal->j_state_lock); write_unlock(&journal->j_state_lock); } |--kfree(journal); } } wake_up(&journal->j_wait_done_commit); --> this step now results into use after free issue. } Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6fa7aa50 |
|
27-Oct-2016 |
Tejun Heo <tj@kernel.org> |
fs/jbd2, locking/mutex, sched/wait: Use mutex_lock_io() for journal->j_checkpoint_mutex When an ext4 fs is bogged down by a lot of metadata IOs (in the reported case, it was deletion of millions of files, but any massive amount of journal writes would do), after the journal is filled up, tasks which try to access the filesystem and aren't currently performing the journal writes end up waiting in __jbd2_log_wait_for_space() for journal->j_checkpoint_mutex. Because those mutex sleeps aren't marked as iowait, this condition can lead to misleadingly low iowait and /proc/stat:procs_blocked. While iowait propagation is far from strict, this condition can be triggered fairly easily and annotating these sleeps correctly helps initial diagnosis quite a bit. Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that these sleeps are properly marked as iowait. Reported-by: Mingbo Wan <mingbo@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
7c0f6ba6 |
|
24-Dec-2016 |
Linus Torvalds <torvalds@linux-foundation.org> |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
70fd7614 |
|
01-Nov-2016 |
Christoph Hellwig <hch@lst.de> |
block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
f0c9fd54 |
|
14-Sep-2016 |
Geliang Tang <geliangtang@gmail.com> |
jbd2: move more common code into journal_init_common() There are some repetitive code in jbd2_journal_init_dev() and jbd2_journal_init_inode(). So this patch moves the common code into journal_init_common() helper to simplify the code. And fix the coding style warnings reported by checkpatch.pl by the way. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
1eaa566d |
|
30-Jun-2016 |
Jan Kara <jack@suse.cz> |
jbd2: track more dependencies on transaction commit So far we were tracking only dependency on transaction commit due to starting a new handle (which may require commit to start a new transaction). Now add tracking also for other cases where we wait for transaction commit. This way lockdep can catch deadlocks e. g. because we call jbd2_journal_stop() for a synchronous handle with some locks held which rank below transaction start. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
ab714aff |
|
30-Jun-2016 |
Jan Kara <jack@suse.cz> |
jbd2: move lockdep tracking to journal_s Currently lockdep map is tracked in each journal handle. To be able to expand lockdep support to cover also other cases where we depend on transaction commit and where handle is not available, move lockdep map into struct journal_s. Since this makes the lockdep map shared for all handles, we have to use rwsem_acquire_read() for acquisitions now. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
f2db1971 |
|
24-Jun-2016 |
Michal Hocko <mhocko@suse.com> |
jbd2: get rid of superfluous __GFP_REPEAT jbd2_alloc is explicit about its allocation preferences wrt. the allocation size. Sub page allocations go to the slab allocator and larger are using either the page allocator or vmalloc. This is all good but the logic is unnecessarily complex. 1) as per Ted, the vmalloc fallback is a left-over: : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so : the code path that calls vmalloc() should never get called. When we : conveted jbd2_alloc() to suppor sub-page size allocations in commit : d2eecb039368, there was an assumption that it could be called with a size : greater than PAGE_SIZE, but that's certaily not true today. Moreover vmalloc allocation might even lead to a deadlock because the callers expect GFP_NOFS context while vmalloc is GFP_KERNEL. 2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored since the flag was introduced. Let's simplify the code flow and use the slab allocator for sub-page requests and the page allocator for others. Even though order > 0 is not currently used as per above leave that option open. Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
28a8f0d3 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
dfec8a14 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
fs: have ll_rw_block users pass in op and flags separately This has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
2a222ca9 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
fs: have submit_bh users pass in op and flags separately This has submit_bh users pass in the operation and flags separately, so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
41617e1a |
|
23-Apr-2016 |
Jan Kara <jack@suse.cz> |
jbd2: add support for avoiding data writes during transaction commits Currently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus asking jbd2 to write dirty pages just unnecessarily adds more work to jbd2 possibly writing back other redirtied blocks. Add support to jbd2 to allow filesystem to ask jbd2 to only wait for outstanding data writes before committing a transaction and thus avoid unnecessary writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
09cbfeaf |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c0a2ad9b |
|
09-Mar-2016 |
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> |
jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
cb0d9d47 |
|
22-Feb-2016 |
Jan Kara <jack@suse.cz> |
jbd2: save some atomic ops in __JI_COMMIT_RUNNING handling Currently we used atomic bit operations to manipulate __JI_COMMIT_RUNNING bit. However this is unnecessary as i_flags are always written and read under j_list_lock. So just change the operations to standard bit operations. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
1101cd4d |
|
22-Feb-2016 |
Jan Kara <jack@suse.cz> |
jbd2: unify revoke and tag block checksum handling Revoke and tag descriptor blocks are just different kinds of descriptor blocks and thus have checksum in the same place. Unify computation and checking of checksums for these. Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
32ab6715 |
|
22-Feb-2016 |
Jan Kara <jack@suse.cz> |
jbd2: factor out common descriptor block initialization Descriptor block header is initialized in several places. Factor out the common code into jbd2_journal_get_descriptor_buffer(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
4327ba52 |
|
18-Oct-2015 |
Daeho Jeong <daeho.jeong@samsung.com> |
ext4, jbd2: ensure entering into panic after recording an error in superblock If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() | -> journal->j_flags |= JBD2_ABORT; | | __ext4_abort() | -> jbd2_journal_abort() | | -> __journal_abort_soft() | | -> if (journal->j_flags & JBD2_ABORT) | | return; | -> panic() | -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
56316a0d |
|
17-Oct-2015 |
Darrick J. Wong <darrick.wong@oracle.com> |
jbd2: clean up feature test macros with predicate functions Create separate predicate functions to test/set/clear feature flags, thereby replacing the wordy old macros. Furthermore, clean out the places where we open-coded feature tests. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6a797d27 |
|
17-Oct-2015 |
Darrick J. Wong <darrick.wong@oracle.com> |
ext4: call out CRC and corruption errors with specific error codes Instead of overloading EIO for CRC errors and corrupt structures, return the same error codes that XFS returns for the same issues. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
8595798c |
|
15-Oct-2015 |
Darrick J. Wong <darrick.wong@oracle.com> |
jbd2: gate checksum calculations on crc driver presence, not sb flags Change the journal's checksum functions to gate on whether or not the crc32c driver is loaded, and gate the loading on the superblock bits. This prevents a journal crash if someone loads a journal in no-csum mode and then randomizes the superblock, thus flipping on the feature bits. Tested-By: Nikolay Borisov <kernel@kyup.com> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
841df7df |
|
28-Jul-2015 |
Jan Kara <jack@suse.com> |
jbd2: avoid infinite loop when destroying aborted journal Commit 6f6a6fda2945 "jbd2: fix ocfs2 corrupt when updating journal superblock fails" changed jbd2_cleanup_journal_tail() to return EIO when the journal is aborted. That makes logic in jbd2_log_do_checkpoint() bail out which is fine, except that jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make a progress in cleaning the journal. Without it jbd2_journal_destroy() just loops in an infinite loop. Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of jbd2_log_do_checkpoint() fails with error. Reported-by: Eryu Guan <guaneryu@gmail.com> Tested-by: Eryu Guan <guaneryu@gmail.com> Fixes: 6f6a6fda294506dfe0e3e0a253bb2d2923f28f0a Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
564bc402 |
|
23-Jul-2015 |
Daeho Jeong <daeho.jeong@samsung.com> |
ext4, jbd2: add REQ_FUA flag when recording an error in the superblock When an error condition is detected, an error status should be recorded into superblocks of EXT4 or JBD2. However, the write request is submitted now without REQ_FUA flag, even in "barrier=1" mode, which is followed by panic() function in "errors=panic" mode. On mobile devices which make whole system reset as soon as kernel panic occurs, this write request containing an error flag will disappear just from storage cache without written to the physical cells. Therefore, when next start, even forever, the error flag cannot be shown in both superblocks, and e2fsck cannot fix the filesystem problems automatically, unless e2fsck is executed in force checking mode. [ Changed use test_opt(sb, BARRIER) of checking the journal flags -- TYT ] Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
81ae394b |
|
25-Jun-2015 |
Rasmus Villemoes <linux@rasmusvillemoes.dk> |
fs/jbd2/journal.c: use strreplace() In one case, we eliminate a local variable; in the other a strlen() call and some .text. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7b506b10 |
|
15-Jun-2015 |
Michal Hocko <mhocko@suse.cz> |
jbd2: get rid of open coded allocation retry loop insert_revoke_hash does an open coded endless allocation loop if journal_oom_retry is true. It doesn't implement any allocation fallback strategy between the retries, though. The memory allocator doesn't know about the never fail requirement so it cannot potentially help to move on with the allocation (e.g. use memory reserves). Get rid of the retry loop and use __GFP_NOFAIL instead. We will lose the debugging message but I am not sure it is anyhow helpful. Do the same for journal_alloc_journal_head which is doing a similar thing. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6f6a6fda |
|
15-Jun-2015 |
Joseph Qi <joseph.qi@huawei.com> |
jbd2: fix ocfs2 corrupt when updating journal superblock fails If updating journal superblock fails after journal data has been flushed, the error is omitted and this will mislead the caller as a normal case. In ocfs2, the checkpoint will be treated successfully and the other node can get the lock to update. Since the sb_start is still pointing to the old log block, it will rewrite the journal data during journal recovery by the other node. Thus the new updates will be overwritten and ocfs2 corrupts. So in above case we have to return the error, and ocfs2_commit_cache will take care of the error and prevent the other node to do update first. And only after recovering journal it can do the new updates. The issue discussion mail can be found at: https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html http://comments.gmane.org/gmane.comp.file-systems.ext4/48841 [ Fixed bug in patch which allowed a non-negative error return from jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this was causing xfstests ext4/306 to fail. -- Ted ] Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: stable@vger.kernel.org
|
#
de92c8ca |
|
07-Jun-2015 |
Jan Kara <jack@suse.cz> |
jbd2: speedup jbd2_journal_get_[write|undo]_access() jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are frequently called for buffers that are already part of the running transaction - most frequently it is the case for bitmaps, inode table blocks, and superblock. Since in such cases we have nothing to do, it is unfortunate we still grab reference to journal head, lock the bh, lock bh_state only to find out there's nothing to do. Improving this is a bit subtle though since until we find out journal head is attached to the running transaction, it can disappear from under us because checkpointing / commit decided it's no longer needed. We deal with this by protecting journal_head slab with RCU. We still have to be careful about journal head being freed & reallocated within slab and about exposing journal head in consistent state (in particular b_modified and b_frozen_data must be in correct state before we allow user to touch the buffer). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
6ccaf3e2 |
|
08-Jun-2015 |
Michal Hocko <mhocko@suse.cz> |
jbd2: revert must-not-fail allocation loops back to GFP_NOFAIL This basically reverts 47def82672b3 (jbd2: Remove __GFP_NOFAIL from jbd2 layer). The deprecation of __GFP_NOFAIL was a bad choice because it led to open coding the endless loop around the allocator rather than removing the dependency on the non failing allocation. So the deprecation was a clear failure and the reality tells us that __GFP_NOFAIL is not even close to go away. It is still true that __GFP_NOFAIL allocations are generally discouraged and new uses should be evaluated and an alternative (pre-allocations or reservations) should be considered but it doesn't make any sense to lie the allocator about the requirements. Allocator can take steps to help making a progress if it knows the requirements. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Acked-by: David Rientjes <rientjes@google.com>
|
#
32f38691 |
|
01-Dec-2014 |
Darrick J. Wong <darrick.wong@oracle.com> |
jbd2: fix regression where we fail to initialize checksum seed when loading When we're enabling journal features, we cannot use the predicate jbd2_journal_has_csum_v2or3() because we haven't yet set the sb feature flag fields! Moreover, we just finished loading the shash driver, so the test is unnecessary; calculate the seed always. Without this patch, we fail to initialize the checksum seed the first time we turn on journal_checksum, which means that all journal blocks written during that first mount are corrupt. Transactions written after the second mount will be fine, since the feature flag will be set in the journal superblock. xfstests generic/{034,321,322} are the regression tests. (This is important for 3.18.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM> Reported-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
d9f39d1e |
|
25-Nov-2014 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: remove unnecessary NULL check before iput() Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
feb8c6d3 |
|
11-Sep-2014 |
Darrick J. Wong <darrick.wong@oracle.com> |
jbd2: fix journal checksum feature flag handling Clear all three journal checksum feature flags before turning on whichever journal checksum options we want. Rearrange the error checking so that newer flags get complained about first. Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
a49058fa |
|
04-Sep-2014 |
Gioh Kim <gioh.kim@lge.com> |
jbd/jbd2: use non-movable memory for the jbd superblock Sicne the jbd/jbd2 superblock is not released until the file system is unmounted, allocate the buffer cache from the non-moveable area to allow page migration and CMA allocations to more easily succeed. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
db9ee220 |
|
27-Aug-2014 |
Darrick J. Wong <darrick.wong@oracle.com> |
jbd2: fix descriptor block size handling errors with journal_csum It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
3469a32a |
|
08-Mar-2014 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: don't hold j_state_lock while calling wake_up() The j_state_lock is one of the hottest locks in the jbd2 layer and thus one of its scalability bottlenecks. We don't need to be holding the j_state_lock while we are calling wake_up(&journal->j_wait_commit), so release the lock a little bit earlier. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
7747e6d0 |
|
17-Feb-2014 |
Rashika Kheria <rashika.kheria@gmail.com> |
jbd2: mark file-local functions as static Mark functions as static in jbd2/journal.c because they are not used outside this file. This eliminates the following warning in jbd2/journal.c: fs/jbd2/journal.c:125:5: warning: no previous prototype for ‘jbd2_verify_csum_type’ [-Wmissing-prototypes] fs/jbd2/journal.c:146:5: warning: no previous prototype for ‘jbd2_superblock_csum_verify’ [-Wmissing-prototypes] fs/jbd2/journal.c:154:6: warning: no previous prototype for ‘jbd2_superblock_csum_set’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
a67c848a |
|
08-Dec-2013 |
Dmitry Monakhov <dmonakhov@openvz.org> |
jbd2: rename obsoleted msg JBD->JBD2 Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
75685071 |
|
08-Dec-2013 |
Jan Kara <jack@suse.cz> |
jbd2: revise KERN_EMERG error messages Some of KERN_EMERG printk messages do not really deserve this log level and the one in log_wait_commit() is even rather useless (the journal has been previously aborted and *that* is where we should have been complaining). So make some messages just KERN_ERR and remove the useless message. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
18a6ea1e |
|
28-Aug-2013 |
Darrick J. Wong <darrick.wong@oracle.com> |
jbd2: Fix endian mixing problems in the checksumming code In the jbd2 checksumming code, explicitly declare separate variables with endianness information so that we don't get confused and screw things up again. Also fixes sparse warnings. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
fe52d17c |
|
01-Jul-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: move superblock checksum calculation to jbd2_write_superblock() Some of the functions which modify the jbd2 superblock were not updating the checksum before calling jbd2_write_superblock(). Move the call to jbd2_superblock_csum_set() to jbd2_write_superblock(), so that the checksum is calculated consistently. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: stable@vger.kernel.org
|
#
169f1a2a |
|
12-Jun-2013 |
Paul Gortmaker <paul.gortmaker@windriver.com> |
jbd2: use a single printk for jbd_debug() Since the jbd_debug() is implemented with two separate printk() calls, it can lead to corrupted and misleading debug output like the following (see lines marked with "*"): [ 290.339362] (fs/jbd2/journal.c, 203): kjournald2: kjournald2 wakes [ 290.339365] (fs/jbd2/journal.c, 155): kjournald2: commit_sequence=42103, commit_request=42104 [ 290.339369] (fs/jbd2/journal.c, 158): kjournald2: OK, requests differ [* 290.339376] (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: [* 290.339379] (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: want 42104, j_commit_sequence=42103 [* 290.339382] JBD2: starting commit of transaction 42104 [ 290.339410] (fs/jbd2/revoke.c, 566): jbd2_journal_write_revoke_records: Wrote 0 revoke records [ 290.376555] (fs/jbd2/commit.c, 1088): jbd2_journal_commit_transaction: JBD2: commit 42104 complete, head 42079 i.e. the debug output from log_wait_commit and journal_commit_transaction have become interleaved. The output should have been: (fs/jbd2/journal.c, 648): jbd2_log_wait_commit: JBD2: want 42104, j_commit_sequence=42103 (fs/jbd2/commit.c, 370): jbd2_journal_commit_transaction: JBD2: starting commit of transaction 42104 It is expected that this is not easy to replicate -- I was only able to cause it on preempt-rt kernels, and even then only under heavy I/O load. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Suggested-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9ff86446 |
|
12-Jun-2013 |
Dmitry Monakhov <dmonakhov@openvz.org> |
jbd2: optimize jbd2_journal_force_commit Current implementation of jbd2_journal_force_commit() is suboptimal because result in empty and useless commits. But callers just want to force and wait any unfinished commits. We already have jbd2_journal_force_commit_nested() which does exactly what we want, except we are guaranteed that we do not hold journal transaction open. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8f7d89f3 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: transaction reservation support In some cases we cannot start a transaction because of locking constraints and passing started transaction into those places is not handy either because we could block transaction commit for too long. Transaction reservation is designed to solve these issues. It reserves a handle with given number of credits in the journal and the handle can be later attached to the running transaction without blocking on commit or checkpointing. Reserved handles do not block transaction commit in any way, they only reduce maximum size of the running transaction (because we have to always be prepared to accomodate request for attaching reserved handle). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f29fad72 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: remove unused waitqueues j_wait_logspace and j_wait_checkpoint are unused. Remove them. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
76c39904 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: cleanup needed free block estimates when starting a transaction __jbd2_log_space_left() and jbd_space_needed() were kind of odd. jbd_space_needed() accounted also credits needed for currently committing transaction while it didn't account for credits needed for control blocks. __jbd2_log_space_left() then accounted for control blocks as a fraction of free space. Since results of these two functions are always only compared against each other, this works correct but is somewhat strange. Move the estimates so that jbd_space_needed() returns number of blocks needed for a transaction including control blocks and __jbd2_log_space_left() returns free space in the journal (with the committing transaction already subtracted). Rename functions to jbd2_log_space_left() and jbd2_space_needed() while we are changing them. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
b34090e5 |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: refine waiting for shadow buffers Currently when we add a buffer to a transaction, we wait until the buffer is removed from BJ_Shadow list (so that we prevent any changes to the buffer that is just written to the journal). This can take unnecessarily long as a lot happens between the time the buffer is submitted to the journal and the time when we remove the buffer from BJ_Shadow list. (e.g. We wait for all data buffers in the transaction, we issue a cache flush, etc.) Also this creates a dependency of do_get_write_access() on transaction commit (namely waiting for data IO to complete) which we want to avoid when implementing transaction reservation. So we modify commit code to set new BH_Shadow flag when temporary shadowing buffer is created and we clear that flag once IO on that buffer is complete. This allows do_get_write_access() to wait only for BH_Shadow bit and thus removes the dependency on data IO completion. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e5a120ae |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: remove journal_head from descriptor buffers Similarly as for metadata buffers, also log descriptor buffers don't really need the journal head. So strip it and remove BJ_LogCtl list. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f5113eff |
|
03-Jun-2013 |
Jan Kara <jack@suse.cz> |
jbd2: don't create journal_head for temporary journal buffers When writing metadata to the journal, we create temporary buffer heads for that task. We also attach journal heads to these buffer heads but the only purpose of the journal heads is to keep buffers linked in transaction's BJ_IO list. We remove the need for journal heads by reusing buffer_head's b_assoc_buffers list for that purpose. Also since BJ_IO list is just a temporary list for transaction commit, we use a private list in jbd2_journal_commit_transaction() for that thus removing BJ_IO list from transaction completely. Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
5d9cf9c6 |
|
28-May-2013 |
Zheng Liu <wenqing.lz@taobao.com> |
jbd2: use kmem_cache_zalloc for allocating journal head This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/ memset when a new journal head is alloctated. Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Cc: "Theodore Ts'o" <tytso@mit.edu>
|
#
e7600409 |
|
29-Apr-2013 |
majianpeng <majianpeng@gmail.com> |
fs/buffer.c: remove unnecessary init operation after allocating buffer_head. bh allocation uses kmem_cache_zalloc() so we needn't call 'init_buffer(bh, NULL, NULL)' and perform other set-zero-operations. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d9dda78b |
|
31-Mar-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
procfs: new helper - PDE_DATA(inode) The only part of proc_dir_entry the code outside of fs/proc really cares about is PDE(inode)->data. Provide a helper for that; static inline for now, eventually will be moved to fs/proc, along with the knowledge of struct proc_dir_entry layout. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d76a3a77 |
|
03-Apr-2013 |
Theodore Ts'o <tytso@mit.edu> |
ext4/jbd2: don't wait (forever) for stale tid caused by wraparound In the case where an inode has a very stale transaction id (tid) in i_datasync_tid or i_sync_tid, it's possible that after a very large (2**31) number of transactions, that the tid number space might wrap, causing tid_geq()'s calculations to fail. Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily", attempted to fix this problem, but it only avoided kjournald spinning forever by fixing the logic in jbd2_log_start_commit(). Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c that might call jbd2_log_start_commit() with a stale tid, those functions will subsequently call jbd2_log_wait_commit() with the same stale tid, and then wait for a very long time. To fix this, we replace the calls to jbd2_log_start_commit() and jbd2_log_wait_commit() with a call to a new function, jbd2_complete_transaction(), which will correctly handle stale tid's. As a bonus, jbd2_complete_transaction() will avoid locking j_state_lock for writing unless a commit needs to be started. This should have a small (but probably not measurable) improvement for ext4's scalability. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Ben Hutchings <ben@decadent.org.uk> Reported-by: George Barnett <gbarnett@atlassian.com> Cc: stable@vger.kernel.org
|
#
b6e96d00 |
|
09-Feb-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: use module parameters instead of debugfs for jbd_debug There are multiple reasons to move away from debugfs. First of all, we are only using it for a single parameter, and it is much more complicated to set up (some 30 lines of code compared to 3), and one more thing that might fail while loading the jbd2 module. Secondly, as a module paramter it can be specified as a boot option if jbd2 is built into the kernel, or as a parameter when the module is loaded, and it can also be manipulated dynamically under /sys/module/jbd2/parameters/jbd2_debug. So it is more flexible. Ultimately we want to move away from using jbd_debug() towards tracepoints, but for now this is still a useful simplification of the code base. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9fff24aa |
|
06-Feb-2013 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: track request delay statistics Track the delay between when we first request that the commit begin and when it actually begins, so we can see how much of a gap exists. In theory, this should just be the remaining scheduling quantuum of the thread which requested the commit (assuming it was not a synchronous operation which triggered the commit request) plus scheduling overhead; however, it's possible that real time processes might get in the way of letting the kjournald thread from executing. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e7b04ac0 |
|
29-Jan-2013 |
Eric Sandeen <sandeen@redhat.com> |
jbd2: don't wake kjournald unnecessarily Don't send an extra wakeup to kjournald in the case where we already have the proper target in j_commit_request, i.e. that transaction has already been requested for commit. commit deeeaf13 "jbd2: fix fsync() tid wraparound bug" changed the logic leading to a wakeup, but it caused some extra wakeups which were found to lead to a measurable performance regression. Signed-off-by: Eric Sandeen <sandeen@redhat.com> [tytso@mit.edu: reworked check to make it clearer] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
37be2f59 |
|
08-Nov-2012 |
Eric Sandeen <sandeen@redhat.com> |
ext4: remove ext4_handle_release_buffer() ext4_handle_release_buffer() was intended to remove journal write access from a buffer, but it doesn't actually do anything at all other than add a BUFFER_TRACE point, but it's not reliably used for that either. Remove all the associated dead code. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
|
#
eeecef0a |
|
18-Aug-2012 |
Eric Sandeen <sandeen@redhat.com> |
jbd2: don't write superblock when if its empty This sequence: # truncate --size=1g fsfile # mkfs.ext4 -F fsfile # mount -o loop,ro fsfile /mnt # umount /mnt # dmesg | tail results in an IO error when unmounting the RO filesystem: [ 318.020828] Buffer I/O error on device loop1, logical block 196608 [ 318.027024] lost page write due to I/O error on loop1 [ 318.032088] JBD2: Error -5 detected when updating journal superblock for loop1-8. This was a regression introduced by commit 24bcc89c7e7c: "jbd2: split updating of journal superblock and marking journal empty". Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
|
#
d796c52e |
|
05-Aug-2012 |
Theodore Ts'o <tytso@mit.edu> |
ext4: make sure the journal sb is written in ext4_clear_journal_err() After we transfer set the EXT4_ERROR_FS bit in the file system superblock, it's not enough to call jbd2_journal_clear_err() to clear the error indication from journal superblock --- we need to call jbd2_journal_update_sb_errno() as well. Otherwise, when the root file system is mounted read-only, the journal is replayed, and the error indicator is transferred to the superblock --- but the s_errno field in the jbd2 superblock is left set (since although we cleared it in memory, we never flushed it out to disk). This can end up confusing e2fsck. We should make e2fsck more robust in this case, but the kernel shouldn't be leaving things in this confused state, either. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
|
#
12810ad7 |
|
25-Jul-2012 |
Artem Bityutskiy <artem.bityutskiy@linux.intel.com> |
jbd/jbd2: nuke write_super from comments The '->write_super' superblock method is gone, and this patch removes all the references to 'write_super' from various jbd and jbd2. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c3900875 |
|
27-May-2012 |
Darrick J. Wong <djwong@us.ibm.com> |
jbd2: checksum data blocks that are stored in the journal Calculate and verify checksums of each data block being stored in the journal. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
4fd5ea43 |
|
27-May-2012 |
Darrick J. Wong <djwong@us.ibm.com> |
jbd2: checksum journal superblock Calculate and verify a checksum covering the journal superblock. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
01b5adce |
|
27-May-2012 |
Darrick J. Wong <djwong@us.ibm.com> |
jbd2: Grab a reference to the crc32c driver if necessary Obtain a reference to the crc32c driver if needed for the v2 checksum. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
25ed6e8a |
|
27-May-2012 |
Darrick J. Wong <djwong@us.ibm.com> |
jbd2: enable journal clients to enable v2 checksumming Add in the necessary code so that journal clients can enable the new journal checksumming features. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
9ffc93f2 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
|
#
303a8f2a |
|
25-Nov-2011 |
Cong Wang <amwang@redhat.com> |
jbd2: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>
|
#
3339578f |
|
13-Mar-2012 |
Jan Kara <jack@suse.cz> |
jbd2: cleanup journal tail after transaction commit Normally, we have to issue a cache flush before we can update journal tail in journal superblock, effectively wiping out old transactions from the journal. So use the fact that during transaction commit we issue cache flush anyway and opportunistically push journal tail as far as we can. Since update of journal superblock is still costly (we have to use WRITE_FUA), we update log tail only if we can free significant amount of space. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
79feb521 |
|
13-Mar-2012 |
Jan Kara <jack@suse.cz> |
jbd2: issue cache flush after checkpointing even with internal journal When we reach jbd2_cleanup_journal_tail(), there is no guarantee that checkpointed buffers are on a stable storage - especially if buffers were written out by jbd2_log_do_checkpoint(), they are likely to be only in disk's caches. Thus when we update journal superblock effectively removing old transaction from journal, this write of superblock can get to stable storage before those checkpointed buffers which can result in filesystem corruption after a crash. Thus we must unconditionally issue a cache flush before we update journal superblock in these cases. A similar problem can also occur if journal superblock is written only in disk's caches, other transaction starts reusing space of the transaction cleaned from the log and power failure happens. Subsequent journal replay would still try to replay the old transaction but some of it's blocks may be already overwritten by the new transaction. For this reason we must use WRITE_FUA when updating log tail and we must first write new log tail to disk and update in-memory information only after that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
35c80422 |
|
03-Feb-2012 |
Nigel Cunningham <nigel@tuxonice.net> |
PM / Sleep: JBD and JBD2 missing set_freezable() With the latest and greatest changes to the freezer, I started seeing panics that were caused by jbd2 running post-process freezing and hitting the canary BUG_ON for non-TuxOnIce I/O submission. I've traced this back to a lack of set_freezable calls in both jbd and jbd2. Since they're clearly meant to be frozen (there are tests for freezing()), I submit the following patch to add the missing calls. Signed-off-by: Nigel Cunningham <nigel@tuxonice.net> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
#
a78bb11d |
|
13-Mar-2012 |
Jan Kara <jack@suse.cz> |
jbd2: protect all log tail updates with j_checkpoint_mutex There are some log tail updates that are not protected by j_checkpoint_mutex. Some of these are harmless because they happen during startup or shutdown but updates in jbd2_journal_commit_transaction() and jbd2_journal_flush() can really race with other log tail updates (e.g. someone doing jbd2_journal_flush() with someone running jbd2_cleanup_journal_tail()). So protect all log tail updates with j_checkpoint_mutex. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
24bcc89c |
|
13-Mar-2012 |
Jan Kara <jack@suse.cz> |
jbd2: split updating of journal superblock and marking journal empty There are three case of updating journal superblock. In the first case, we want to mark journal as empty (setting s_sequence to 0), in the second case we want to update log tail, in the third case we want to update s_errno. Split these cases into separate functions. It makes the code slightly more straightforward and later patches will make the distinction even more important. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
43e625d8 |
|
20-Feb-2012 |
Eric Sandeen <sandeen@redhat.com> |
ext4: remove the journal=update mount option The V2 journal format was introduced around ten years ago, for ext3. It seems highly unlikely that anyone will need this migration option for ext4. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
4185a2ac |
|
20-Feb-2012 |
Yongqiang Yang <xiaoqiangnk@gmail.com> |
jbd2: rename functions which initialize slab caches This patch renames functions initializing the slab caches for the journal head and handle structures to so they are consistent with the names of the corresponding functions which destroys those slab caches. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
0c2022ec |
|
20-Feb-2012 |
Yongqiang Yang <xiaoqiangnk@gmail.com> |
jbd2: allocate transaction from separate slab cache There is normally only a handful of these active at any one time, but putting them in a separate slab cache makes debugging memory corruption problems easier. Manish Katiyar also wanted this make it easier to test memory failure scenarios in the jbd2 layer. Cc: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
2201c590 |
|
20-Feb-2012 |
Seiji Aguchi <seiji.aguchi@hds.com> |
jbd2: add drop_transaction/update_superblock_end tracepoints This patch adds trace_jbd2_drop_transaction and trace_jbd2_update_superblock_end because there are similar tracepoints in jbd and they are needed in jbd2 as well. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
a0acae0e |
|
21-Nov-2011 |
Tejun Heo <tj@kernel.org> |
freezer: unexport refrigerator() and update try_to_freeze() slightly There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
|
#
f2a44523 |
|
01-Nov-2011 |
Eryu Guan <guaneryu@gmail.com> |
jbd2: Unify log messages in jbd2 code Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the same time other jbd2 code prints with "JBD: " prefix. Unify the prefix to "JBD2: ". Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8762202d |
|
01-Nov-2011 |
Eryu Guan <guaneryu@gmail.com> |
jbd/jbd2: validate sb->s_first in journal_get_superblock() I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3 image has s_first = 0 in journal superblock, and the 0 is passed to journal->j_head in journal_reset(), then to blocknr in cleanup_journal_tail(), in the end the J_ASSERT failed. So validate s_first after reading journal superblock from disk in journal_get_superblock() to ensure s_first is valid. The following script could reproduce it: fstype=ext3 blocksize=1024 img=$fstype.img offset=0 found=0 magic="c0 3b 39 98" dd if=/dev/zero of=$img bs=1M count=8 mkfs -t $fstype -b $blocksize -F $img filesize=`stat -c %s $img` while [ $offset -lt $filesize ] do if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then echo "Found journal: $offset" found=1 break fi offset=`echo "$offset+$blocksize" | bc` done if [ $found -ne 1 ];then echo "Magic \"$magic\" not found" exit 1 fi dd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1 mkdir -p ./mnt mount -o loop $img ./mnt Cc: Jan Kara <jack@suse.cz> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
4862fd60 |
|
10-Jul-2011 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: remove jbd2_dev_to_name() from jbd2 tracepoints Using function calls in TP_printk causes perf heartburn, so print the MAJOR/MINOR device numbers instead. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
de1b7941 |
|
13-Jun-2011 |
Jan Kara <jack@suse.cz> |
jbd2: Fix oops in jbd2_journal_remove_journal_head() jbd2_journal_remove_journal_head() can oops when trying to access journal_head returned by bh2jh(). This is caused for example by the following race: TASK1 TASK2 jbd2_journal_commit_transaction() ... processing t_forget list __jbd2_journal_refile_buffer(jh); if (!jh->b_transaction) { jbd_unlock_bh_state(bh); jbd2_journal_try_to_free_buffers() jbd2_journal_grab_journal_head(bh) jbd_lock_bh_state(bh) __journal_try_to_free_buffer() jbd2_journal_put_journal_head(jh) jbd2_journal_remove_journal_head(bh); jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not part of any transaction and thus frees journal_head before TASK1 gets to doing so. Note that even buffer_head can be released by try_to_free_buffers() after jbd2_journal_put_journal_head() which adds even larger opportunity for oops (but I didn't see this happen in reality). Fix the problem by making transactions hold their own journal_head reference (in b_jcount). That way we don't have to remove journal_head explicitely via jbd2_journal_remove_journal_head() and instead just remove journal_head when b_jcount drops to zero. The result of this is that [__]jbd2_journal_refile_buffer(), [__]jbd2_journal_unfile_buffer(), and __jdb2_journal_remove_checkpoint() can free journal_head which needs modification of a few callers. Also we have to be careful because once journal_head is removed, buffer_head might be freed as well. So we have to get our own buffer_head reference where it matters. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
bbd2be36 |
|
24-May-2011 |
Jan Kara <jack@suse.cz> |
jbd2: Add function jbd2_trans_will_send_data_barrier() Provide a function which returns whether a transaction with given tid will send a flush to the filesystem device. The function will be used by ext4 to detect whether fsync needs to send a separate flush or not. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
1be2add6 |
|
08-May-2011 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: only print the debugging information for tid wraparound once If we somehow wrap, we don't want to keep printing the warning message over and over again. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
deeeaf13 |
|
01-May-2011 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: fix fsync() tid wraparound bug If an application program does not make any changes to the indirect blocks or extent tree, i_datasync_tid will not get updated. If there are enough commits (i.e., 2**31) such that tid_geq()'s calculations wrap, and there isn't a currently active transaction at the time of the fdatasync() call, this can end up triggering a BUG_ON in fs/jbd2/commit.c: J_ASSERT(journal->j_running_transaction != NULL); It's pretty rare that this can happen, since it requires the use of fdatasync() plus *very* frequent and excessive use of fsync(). But with the right workload, it can. We fix this by replacing the use of tid_geq() with an equality test, since there's only one valid transaction id that we is valid for us to wait until it is commited: namely, the currently running transaction (if it exists). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
50f689af |
|
03-Apr-2011 |
Zhu Yanhai <zhu.yanhai@gmail.com> |
jbd2: move bdget out of critical section bdget() should not be called when we hold spinlocks since it might sleep. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
25985edc |
|
30-Mar-2011 |
Lucas De Marchi <lucas.demarchi@profusion.mobi> |
Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
|
#
3c26bdb4 |
|
26-Feb-2011 |
Justin P. Mattock <justinmattock@gmail.com> |
jbd: Remove one to many n's in a word. The Patch below removes one to many "n's" in a word.. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: linux-ext4@vger.kernel.org Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
e4471831 |
|
12-Feb-2011 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: call __jbd2_log_start_commit with j_state_lock write locked On an SMP ARM system running ext4, I've received a report that the first J_ASSERT in jbd2_journal_commit_transaction has been triggering: J_ASSERT(journal->j_running_transaction != NULL); While investigating possible causes for this problem, I noticed that __jbd2_log_start_commit() is getting called with j_state_lock only read-locked, in spite of the fact that it's possible for it might j_commit_request. Fix this by grabbing the necessary information so we can test to see if we need to start a new transaction before dropping the read lock, and then calling jbd2_log_start_commit() which will grab the write lock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8aefcd55 |
|
09-Jan-2011 |
Theodore Ts'o <tytso@mit.edu> |
ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary Replace the jbd2_inode structure (which is 48 bytes) with a pointer and only allocate the jbd2_inode when it is needed --- that is, when the file system has a journal present and the inode has been opened for writing. This allows us to further slim down the ext4_inode_info structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
b7271b0a |
|
18-Dec-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: simplify return path of journal_init_common Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
670be5a7 |
|
17-Dec-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Use pr_notice_ratelimited() in journal_alloc_journal_head() We had an open-coded version of printk_ratelimited(); use the provided abstraction to make the code cleaner and easier to understand. Based on a similar patch for fs/jbd from Namhyung Kim <namhyung@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
0587aa3d |
|
17-Nov-2010 |
yangsheng <sheng.yang@oracle.com> |
jbd2: fix /proc/fs/jbd2/<dev> when using an external journal In jbd2_journal_init_dev(), we need make sure the journal structure is fully initialzied before calling jbd2_stats_proc_init(). Reviewed-by: Andreas Dilger <andreas.dilger@oracle.com> Signed-off-by: yangsheng <sheng.yang@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
51dfacde |
|
16-Oct-2010 |
Thomas Gleixner <tglx@linutronix.de> |
jbd2: Convert jbd2_slab_create_sem to mutex jbd2_slab_create_sem is used as a mutex, so make it one. [ akpm muttered: We may as well make it local to jbd2_journal_create_slab() also. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ted Ts'o <tytso@mit.edu> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <alpine.LFD.2.00.1010162231480.2496@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
#
39e3ac25 |
|
27-Oct-2010 |
Brian King <brking@linux.vnet.ibm.com> |
jbd2: Fix I/O hang in jbd2_journal_release_jbd_inode This fixes a hang seen in jbd2_journal_release_jbd_inode on a lot of Power 6 systems running with ext4. When we get in the hung state, all I/O to the disk in question gets blocked where we stay indefinitely. Looking at the task list, I can see we are stuck in jbd2_journal_release_jbd_inode waiting on a wake up. I added some debug code to detect this scenario and dump additional data if we were stuck in jbd2_journal_release_jbd_inode for longer than 30 minutes. When it hit, I was able to see that i_flags was 0, suggesting we missed the wake up. This patch changes i_flags to be an unsigned long, uses bit operators to access it, and adds barriers around the accesses. Prior to applying this patch, we were regularly hitting this hang on numerous systems in our test environment. After applying the patch, the hangs no longer occur. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
bcf3d0bc |
|
16-Oct-2010 |
Andrea Gelmini <andrea.gelmini@gelma.net> |
jbd/2: fixed typos "wakup" Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
1113e1b5 |
|
22-Jul-2010 |
Patrick J. LoPresti <lopresti@gmail.com> |
JBD2: Allow feature checks before journal recovery Before we start accessing a huge (> 16 TiB) OCFS2 volume, we need to confirm that its journal supports 64-bit offsets. In particular, we need to check the journal's feature bits before recovering the journal. This is not possible with JBD2 at present, because the journal superblock (where the feature bits reside) is not loaded from disk until the journal is recovered. This patch loads the journal superblock in jbd2_journal_check_used_features() if it has not already been loaded, allowing us to check the feature bits before journal recovery. Signed-off-by: Patrick LoPresti <lopresti@gmail.com> Cc: linux-ext4@vger.kernel.org Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
9cb569d6 |
|
11-Aug-2010 |
Christoph Hellwig <hch@lst.de> |
remove SWRITE* I/O types These flags aren't real I/O types, but tell ll_rw_block to always lock the buffer instead of giving up on a failed trylock. Instead add a new write_dirty_buffer helper that implements this semantic and use it from the existing SWRITE* callers. Note that the ll_rw_block code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which this patch fixes. In the ufs code clean up the helper that used to call ll_rw_block to mirror sync_dirty_buffer, which is the function it implements for compound buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
a931da6a |
|
03-Aug-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Change j_state_lock to be a rwlock_t Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
47def826 |
|
27-Jul-2010 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Remove __GFP_NOFAIL from jbd2 layer __GFP_NOFAIL is going away, so add our own retry loop. Also add jbd2__journal_start() and jbd2__journal_restart() which take a gfp mask, so that file systems can optionally (re)start transaction handles using GFP_KERNEL. If they do this, then they need to be prepared to handle receiving an PTR_ERR(-ENOMEM) error, and be ready to reflect that error up to userspace. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
13ceef09 |
|
13-Jul-2010 |
Jan Kara <jack@suse.cz> |
jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
5a0790c2 |
|
14-Jun-2010 |
Andi Kleen <andi@firstfloor.org> |
ext4: remove initialized but not read variables No real bugs found, just removed some dead code. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8ac97b74 |
|
30-Apr-2010 |
Bill Pemberton <wfp5p@virginia.edu> |
jbd2: use NULL instead of 0 when pointer is needed Fixes sparse warning: fs/jbd2/journal.c:1892:9: warning: Using plain integer as NULL pointer Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> CC: linux-ext4@vger.kernel.org Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
#
3ebfdf88 |
|
23-Dec-2009 |
Andrew Morton <akpm@linux-foundation.org> |
jbd2: don't use __GFP_NOFAIL in journal_init_common() It triggers the warning in get_page_from_freelist(), and it isn't appropriate to use __GFP_NOFAIL here anyway. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14843 Reported-by: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
765f8361 |
|
15-Dec-2009 |
Yin Kangkai <kangkai.yin@intel.com> |
jbd: jbd-debug and jbd2-debug should be writable jbd-debug and jbd2-debug is currently read-only (S_IRUGO), which is not correct. Make it writable so that we can start debuging. Signed-off-by: Yin Kangkai <kangkai.yin@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
3b799d15 |
|
09-Dec-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Export jbd2_log_start_commit to fix ext4 build This fixes: ERROR: "jbd2_log_start_commit" [fs/ext4/ext4.ko] undefined! Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
d2eecb03 |
|
07-Dec-2009 |
Theodore Ts'o <tytso@mit.edu> |
ext4: Use slab allocator for sub-page sized allocations Now that the SLUB seems to be fixed so that it respects the requested alignment, use kmem_cache_alloc() to allocator if the block size of the buffer heads to be allocated is less than the page size. Previously, we were using 16k page on a Power system for each buffer, even when the file system was using 1k or 4k block size. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e6ec116b |
|
01-Dec-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Add ENOMEM checking in and for jbd2_journal_write_metadata_buffer() OOM happens. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e6a47428 |
|
15-Nov-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: don't wipe the journal on a failed journal checksum If there is a failed journal checksum, don't reset the journal. This allows for userspace programs to decide how to recover from this situation. It may be that ignoring the journal checksum failure might be a better way of recovering the file system. Once we add per-block checksums, we can definitely do better. Until then, a system administrator can try backing up the file system image (or taking a snapshot) and and trying to determine experimentally whether ignoring the checksum failure or aborting the journal replay results in less data loss. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
|
#
7b02bec0 |
|
10-Nov-2009 |
Tao Ma <tao.ma@oracle.com> |
JBD/JBD2: free j_wbuf if journal init fails. If journal init fails, we need to free j_wbuf. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
828c0950 |
|
01-Oct-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
const: constify remaining file_operations [akpm@linux-foundation.org: fix KVM] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bf699327 |
|
29-Sep-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Use tracepoints for history file The /proc/fs/jbd2/<dev>/history was maintained manually; by using tracepoints, we can get all of the existing functionality of the /proc file plus extra capabilities thanks to the ftrace infrastructure. We save memory as a bonus. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
90576c0b |
|
29-Sep-2009 |
Theodore Ts'o <tytso@mit.edu> |
ext4, jbd2: Drop unneeded printks at mount and unmount time There are a number of kernel printk's which are printed when an ext4 filesystem is mounted and unmounted. Disable them to economize space in the system logs. In addition, disabling the mballoc stats by default saves a number of unneeded atomic operations for every block allocation or deallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
88e9d34c |
|
22-Sep-2009 |
James Morris <jmorris@namei.org> |
seq_file: constify seq_operations Make all seq_operations structs const, to help mitigate against revectoring user-triggerable function pointers. This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
96577c43 |
|
13-Jul-2009 |
dingdinghua <dingdinghua85@gmail.com> |
jbd2: fix race between write_metadata_buffer and get_write_access The function jbd2_journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
f6f50e28 |
|
17-Jul-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Fail to load a journal if it is too short Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
b5744805 |
|
20-Jun-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Remove GFP_ATOMIC kmalloc from inside spinlock critical region Fix jbd2_dev_to_name(), a function used when pretty-printting jbd2 and ext4 tracepoints. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
bfcd3555 |
|
08-Jun-2009 |
Alberto Bertogli <albertito@blitiri.com.ar> |
jbd2: Fix minor typos in comments in fs/jbd2/journal.c Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
879c5e6b |
|
17-Jun-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: convert instrumentation from markers to tracepoints Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
c88ccea3 |
|
10-Feb-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Fix return value of jbd2_journal_start_commit() The function jbd2_journal_start_commit() returns 1 if either a transaction is committing or the function has queued a transaction commit. But it returns 0 if we raced with somebody queueing the transaction commit as well. This resulted in ext4_sync_fs() not functioning correctly (description from Arthur Jones): In the case of a data=ordered umount with pending long symlinks which are delayed due to a long list of other I/O on the backing block device, this causes the buffer associated with the long symlinks to not be moved to the inode dirty list in the second phase of fsync_super. Then, before they can be dirtied again, kjournald exits, seeing the UMOUNT flag and the dirty pages are never written to the backing block device, causing long symlink corruption and exposing new or previously freed block data to userspace. This can be reproduced with a script created by Eric Sandeen <sandeen@redhat.com>: #!/bin/bash umount /mnt/test2 mount /dev/sdb4 /mnt/test2 rm -f /mnt/test2/* dd if=/dev/zero of=/mnt/test2/bigfile bs=1M count=512 touch /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename ln -s /mnt/test2/thisisveryveryveryveryveryveryveryveryveryveryveryveryveryveryveryverylongfilename /mnt/test2/link umount /mnt/test2 mount /dev/sdb4 /mnt/test2 ls /mnt/test2/ This patch fixes jbd2_journal_start_commit() to always return 1 when there's a transaction committing or queued for commit. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> CC: Eric Sandeen <sandeen@redhat.com> CC: linux-ext4@vger.kernel.org
|
#
c225aa57 |
|
11-Jan-2009 |
Simon Holm Thøgersen <odie@cs.aau.dk> |
ext4: fix wrong use of do_div the following warning: fs/jbd2/journal.c: In function ‘jbd2_seq_info_show’: fs/jbd2/journal.c:850: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘uint32_t’ is caused by wrong usage of do_div that modifies the dividend in-place and returns the quotient. So not only would an incorrect value be displayed, but s->journal->j_average_commit_time would also be changed to a wrong value! Fix it by using div_u64 instead. Signed-off-by: Simon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
4b905671 |
|
06-Jan-2009 |
Jan Kara <jack@suse.cz> |
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs On 32-bit system with CONFIG_LBD getblk can fail because provided block number is too big. Add error checks so we fail gracefully if getblk() returns NULL (which can also happen on memory allocation failures). Thanks to David Maciejak from Fortinet's FortiGuard Global Security Research Team for reporting this bug. http://bugzilla.kernel.org/show_bug.cgi?id=12370 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> cc: stable@kernel.org
|
#
e06c8227 |
|
11-Sep-2008 |
Joel Becker <joel.becker@oracle.com> |
jbd2: Add buffer triggers Filesystems often to do compute intensive operation on some metadata. If this operation is repeated many times, it can be very expensive. It would be much nicer if the operation could be performed once before a buffer goes to disk. This adds triggers to jbd2 buffer heads. Just before writing a metadata buffer to the journal, jbd2 will optionally call a commit trigger associated with the buffer. If the journal is aborted, an abort trigger will be called on any dirty buffers as they are dropped from pending transactions. ocfs2 will use this feature. Initially I tried to come up with a more generic trigger that could be used for non-buffer-related events like transaction completion. It doesn't tie nicely, because the information a buffer trigger needs (specific to a journal_head) isn't the same as what a transaction trigger needs (specific to a tranaction_t or perhaps journal_t). So I implemented a buffer set, with the understanding that journal/transaction wide triggers should be implemented separately. There is only one trigger set allowed per buffer. I can't think of any reason to attach more than one set. Contrast this with a journal or transaction in which multiple places may want to watch the entire transaction separately. The trigger sets are considered static allocation from the jbd2 perspective. ocfs2 will just have one trigger set per block type, setting the same set on every bh of the same type. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
4a9bf99b |
|
03-Jan-2009 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Add pid and journal device name to the "kjournald2 starting" message Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
c3191067 |
|
06-Jan-2009 |
Theodore Ts'o <tytso@mit.edu> |
ext4: Remove code to create the journal inode This code has been obsolete in quite some time, since the supported method for adding a journal inode is to use tune2fs (or to creating new filesystem with a journal via mke2fs or mkfs.ext4). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
1a0d3786 |
|
04-Nov-2008 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: Remove a large array of bh's from the stack of the checkpoint routine jbd2_log_do_checkpoint()n is one of the kernel's largest stack users. Move the array of buffer head's from the stack of jbd2_log_do_checkpoint() to the in-core journal structure. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
30773840 |
|
03-Jan-2009 |
Theodore Ts'o <tytso@mit.edu> |
ext4: add fsync batch tuning knobs Add new mount options, min_batch_time and max_batch_time, which controls how long the jbd2 layer should wait for additional filesystem operations to get batched with a synchronous write transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
d7cfa468 |
|
16-Dec-2008 |
Theodore Ts'o <tytso@mit.edu> |
ext4: display average commit time Display the average commit time (which is used by the ext4 fsync batching patch) in /proc/fs/jbd2/*/info for performance tuning purposes. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
2423840d |
|
02-Nov-2008 |
Sami Liedes <sliedes@cc.hut.fi> |
jbd2: deregister proc on failure in jbd2_journal_init_inode jbd2_journal_init_inode() does not call jbd2_stats_proc_exit() on all failure paths after calling jbd2_stats_proc_init(). This leaves dangling references to the fs in proc. This patch fixes a bug reported by Sami Leides at: http://bugzilla.kernel.org/show_bug.cgi?id=11493 Signed-off-by: Sami Liedes <sliedes@cc.hut.fi> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
44519faf |
|
10-Oct-2008 |
Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> |
jbd2: fix error handling for checkpoint io When a checkpointing IO fails, current JBD2 code doesn't check the error and continue journaling. This means latest metadata can be lost from both the journal and filesystem. This patch leaves the failed metadata blocks in the journal space and aborts journaling in the case of jbd2_log_do_checkpoint(). To achieve this, we need to do: 1. don't remove the failed buffer from the checkpoint list where in the case of __try_to_free_cp_buf() because it may be released or overwritten by a later transaction 2. jbd2_log_do_checkpoint() is the last chance, remove the failed buffer from the checkpoint list and abort the journal 3. when checkpointing fails, don't update the journal super block to prevent the journaled contents from being cleaned. For safety, don't update j_tail and j_tail_sequence either 4. when checkpointing fails, notify this error to the ext4 layer so that ext4 don't clear the needs_recovery flag, otherwise the journaled contents are ignored and cleaned in the recovery phase 5. if the recovery fails, keep the needs_recovery flag 6. prevent jbd2_cleanup_journal_tail() from being called between __jbd2_journal_drop_transaction() and jbd2_journal_abort() (a possible race issue between jbd2_log_do_checkpoint()s called by jbd2_journal_flush() and __jbd2_log_wait_for_space()) Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
#
914258bf |
|
06-Oct-2008 |
Theodore Ts'o <tytso@mit.edu> |
ext4/jbd2: Avoid WARN() messages when failing to write to the superblock This fixes some very common warnings reported by kerneloops.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
05496769 |
|
16-Sep-2008 |
Theodore Ts'o <tytso@mit.edu> |
jbd2: clean up how the journal device name is printed Calculate the journal device name once and stash it away in the journal_s structure. This avoids needing to call bdevname() everywhere and reduces stack usage by not needing to allocate an on-stack buffer. In addition, we eliminate the '/' that can appear in device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that can cause problems when creating proc directory names, and include the inode number to support ocfs2 which creates multiple journals with different inode numbers. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
00b32b7f |
|
26-Jul-2008 |
Theodore Ts'o <tytso@mit.edu> |
ext4: unexport jbd2_journal_update_superblock Remove the unused EXPORT_SYMBOL(jbd2_journal_update_superblock). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
87c89c23 |
|
11-Jul-2008 |
Jan Kara <jack@suse.cz> |
jbd2: Remove data=ordered mode support using jbd buffer heads Signed-off-by: Jan Kara <jack@suse.cz>
|
#
c851ed54 |
|
11-Jul-2008 |
Jan Kara <jack@suse.cz> |
jbd2: Implement data=ordered mode handling via inodes This patch adds necessary framework into JBD2 to be able to track inodes with each transaction and write-out their dirty data during transaction commit time. This new ordered mode brings all sorts of advantages such as possibility to get rid of journal heads and buffer heads for data buffers in ordered mode, better ordering of writes on transaction commit, simplification of some JBD code, no more anonymous pages when truncate of data being committed happens. Also with this new ordered mode, delayed allocation on ordered mode is much simpler. Signed-off-by: Jan Kara <jack@suse.cz>
|
#
f36f21ec |
|
12-May-2008 |
Jean Delvare <khali@linux-fr.org> |
Fix misuses of bdevname() bdevname() fills the buffer that it is given as a parameter, so calling strcpy() or snprintf() on the returned value is redundant (and probably not guaranteed to work - I don't think strcpy and snprintf support overlapping buffers.) Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Stephen Tweedie <sct@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
79da3664 |
|
29-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
jbd2: use non-racy method for proc entries creation Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Signed-off-by: Denis V. Lunev <den@openvz.org> Cc: <linux-ext4@vger.kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
329d291f |
|
17-Apr-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
jdb2: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
620de4e1 |
|
29-Apr-2008 |
Duane Griffin <duaneg@dghda.com> |
jbd2: only create debugfs and stats entries if init is successful jbd2 debugfs and stats entries should only be created if cache initialisation is successful. At the moment they are being created unconditionally which will leave them dangling if cache (and hence module) initialisation fails. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
5648ba5b |
|
17-Apr-2008 |
Randy Dunlap <randy.dunlap@oracle.com> |
jbd2: fix kernel-doc notation Fix kernel-doc notation in jbd2. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
8a9362eb |
|
17-Apr-2008 |
Duane Griffin <duaneg@dghda.com> |
jbd2: replace potentially false assertion with if block If an error occurs during jbd2 cache initialisation it is possible for the journal_head_cache to be NULL when jbd2_journal_destroy_journal_head_cache is called. Replace the J_ASSERT with an if block to handle the situation correctly. Note that even with this fix things will break badly if jbd2 is statically compiled in and cache initialisation fails. Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
1076d17a |
|
28-Mar-2008 |
Al Viro <viro@ftp.linux.org.uk> |
jbd/jbd2 NULL noise Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
77160957 |
|
28-Jan-2008 |
Mingming Cao <cmm@us.ibm.com> |
jbd2: Mark jbd2 slabs as SLAB_TEMPORARY This patch marks slab allocations by jbd2 as short-lived in support of Mel Gorman's "Group short-lived and reclaimable kernel allocations" patch. (Ported from similar changes made to fs/jbd/journal.c and fs/jbd/revoke.c in Mel's patch.) Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
818d276c |
|
28-Jan-2008 |
Girish Shilamkar <girish@clusterfs.com> |
ext4: Add the journal checksum feature The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Girish Shilamkar <girish@clusterfs.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
8e85fb3f |
|
28-Jan-2008 |
Johann Lombardi <johann.lombardi@bull.net> |
jbd2: jbd2 stats through procfs The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4&m=113538565128617&w=2). It provides statistics via procfs such as transaction lifetime and size. Sometimes, investigating performance problems, i find useful to have stats from jbd about transaction's lifetime, size, etc. here is a patch for review and inclusion probably. for example, stats after creation of 3M files in htree directory: [root@bob ~]# cat /proc/fs/jbd/sda/history R/C tid wait run lock flush log hndls block inlog ctime write drop close R 261 8260 2720 0 0 750 9892 8170 8187 C 259 750 0 4885 1 R 262 20 2200 10 0 770 9836 8170 8187 R 263 30 2200 10 0 3070 9812 8170 8187 R 264 0 5000 10 0 1340 0 0 0 C 261 8240 3212 4957 0 R 265 8260 1470 0 0 4640 9854 8170 8187 R 266 0 5000 10 0 1460 0 0 0 C 262 8210 2989 4868 0 R 267 8230 1490 10 0 4440 9875 8171 8188 R 268 0 5000 10 0 1260 0 0 0 C 263 7710 2937 4908 0 R 269 7730 1470 10 0 3330 9841 8170 8187 R 270 0 5000 10 0 830 0 0 0 C 265 8140 3234 4898 0 C 267 720 0 4849 1 R 271 8630 2740 20 0 740 9819 8170 8187 C 269 800 0 4214 1 R 272 40 2170 10 0 830 9716 8170 8187 R 273 40 2280 0 0 3530 9799 8170 8187 R 274 0 5000 10 0 990 0 0 0 where, R - line for transaction's life from T_RUNNING to T_FINISHED C - line for transaction's checkpointing tid - transaction's id wait - for how long we were waiting for new transaction to start (the longest period journal_start() took in this transaction) run - real transaction's lifetime (from T_RUNNING to T_LOCKED lock - how long we were waiting for all handles to close (time the transaction was in T_LOCKED) flush - how long it took to flush all data (data=ordered) log - how long it took to write the transaction to the log hndls - how many handles got to the transaction block - how many blocks got to the transaction inlog - how many blocks are written to the log (block + descriptors) ctime - how long it took to checkpoint the transaction write - how many blocks have been written during checkpointing drop - how many blocks have been dropped during checkpointing close - how many running transactions have been closed to checkpoint this one all times are in msec. [root@bob ~]# cat /proc/fs/jbd/sda/info 280 transaction, each upto 8192 blocks average: 1633ms waiting for transaction 3616ms running transaction 5ms transaction was being locked 1ms flushing data (in ordered mode) 1799ms logging transaction 11781 handles per transaction 5629 blocks per transaction 5641 logged blocks per transaction Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
|
#
6f38c74f |
|
16-Oct-2007 |
Jose R. Santos <jrs@us.ibm.com> |
JBD2: debug code cleanup. Mostly stolen from akpm's JBD cleanup patch. - use `#ifdef foo' instead of `#if defined(foo)' - Make journal_enable_debug __read_mostly just for the heck of it - Make jbd_debugfs_dir and jbd_debug static - debugfs_remove(NULL) is legal: remove unneeded tests - remove unnecessary empty loops Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
cd02ff0b |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
jbd2: JBD_XXX to JBD2_XXX naming cleanup change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
d802ffa8 |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
JBD2/Ext4: Convert kmalloc to kzalloc in jbd2/ext4 Convert kmalloc to kzalloc() and get rid of the memset(). Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
2d917969 |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
JBD2: replace jbd_kmalloc with kmalloc directly. This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
af1e76d6 |
|
16-Oct-2007 |
Mingming Cao <cmm@us.ibm.com> |
JBD2: jbd2 slab allocation cleanups JBD2: Replace slab allocations with page allocations JBD2 allocate memory for committed_data and frozen_data from slab. However JBD2 should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
|
#
20c2df83 |
|
19-Jul-2007 |
Paul Mundt <lethal@linux-sh.org> |
mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
#
0f49d5d0 |
|
18-Jul-2007 |
Jose R. Santos <jrs@us.ibm.com> |
jbd2: Move jbd2-debug file to debugfs The jbd2-debug file used to be located in /proc/sys/fs/jbd2-debug, but it incorrectly used create_proc_entry() instead of the sysctl routines, and no proc entry was ever created. Instead of fixing this we might as well move the jbd2-debug file to debugfs which would be the preferred location for this kind of tunable. The new location is now /sys/kernel/debug/jbd2/jbd2-debug. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
e23291b9 |
|
18-Jul-2007 |
Jose R. Santos <jrs@us.ibm.com> |
jbd2: Fix CONFIG_JBD_DEBUG ifdef to be CONFIG_JBD2_DEBUG When the JBD code was forked to create the new JBD2 code base, the references to CONFIG_JBD_DEBUG where never changed to CONFIG_JBD2_DEBUG. This patch fixes that. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
97f06784 |
|
08-May-2007 |
Pavel Emelianov <xemul@sw.ru> |
jbd: check for error returned by kthread_create on creating journal thread If the thread failed to create the subsequent wait_event will hang forever. This is likely to happen if kernel hits max_threads limit. Will be critical for virtualization systems that limit the number of tasks and kernel memory usage within the container. (akpm: JBD should be converted fully to the kthread API: kthread_should_stop() and kthread_stop()). Cc: <linux-ext4@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e63340ae |
|
08-May-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7dfb7103 |
|
06-Dec-2006 |
Nigel Cunningham <ncunningham@linuxmail.org> |
[PATCH] Add include/linux/freezer.h and move definitions from sched.h Move process freezing functions from include/linux/sched.h to freezer.h, so that modifications to the freezer or the kernel configuration don't require recompiling just about everything. [akpm@osdl.org: fix ueagle driver] Signed-off-by: Nigel Cunningham <nigel@suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
e18b890b |
|
06-Dec-2006 |
Christoph Lameter <clameter@sgi.com> |
[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
5eb30790 |
|
17-Oct-2006 |
Dave Kleikamp <shaggy@austin.ibm.com> |
[PATCH] null dereference in fs/jbd2/journal.c This is Eric Sesterhenn's jbd patch applied to jbd2. Commit: 41716c7c21b15e7ecf14f0caf1eef3980707fb74 His words: Since commit d1807793e1e7e502e3dc047115e9dbc3b50e4534 we dereference a NULL pointer. Coverity id #1432. We set journal to NULL, and use it directly afterwards. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
18eba7aa |
|
11-Oct-2006 |
Mingming Cao <cmm@us.ibm.com> |
[PATCH] jbd2: switch blks_type from sector_t to ull Similar to ext4, change blocks in JBD2 from sector_t to unsigned long long. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
29971769 |
|
11-Oct-2006 |
Mingming Cao <cmm@us.ibm.com> |
[PATCH] jbd2: sector_t conversion JBD layer in-kernel block varibles type fixes to support >32 bit block number and convert to sector_t type. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b517bea1 |
|
11-Oct-2006 |
Zach Brown <zach.brown@oracle.com> |
[PATCH] 64-bit jbd2 core Here is the patch to JBD to handle 64 bit block numbers, originally from Zach Brown. This patch is useful only after adding support for 64-bit block numbers in the filesystem. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
a920e941 |
|
11-Oct-2006 |
Johann Lombardi <johann.lombardi@bull.net> |
[PATCH] jbd2: rename slab jbd and jbd2 currently use the same slab names which must be unique. The patch below just renames jbd2's slabs. Signed-off-by: Johann Lombardi <johann.lombardi@bull.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
f7f4bccb |
|
11-Oct-2006 |
Mingming Cao <cmm@us.ibm.com> |
[PATCH] jbd2: rename jbd2 symbols to avoid duplication of jbd symbols Mingming Cao originally did this work, and Shaggy reproduced it using some scripts from her. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
470decc6 |
|
11-Oct-2006 |
Dave Kleikamp <shaggy@austin.ibm.com> |
[PATCH] jbd2: initial copy of files from jbd This is a simple copy of the files in fs/jbd to fs/jbd2 and /usr/incude/linux/[ext4_]jbd.h to /usr/include/[ext4_]jbd2.h Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|