#
bfc7fbe3 |
|
15-Dec-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ocfs2: remove writepage implementation If the filesystem implements migrate_folio and writepages, there is no need for a writepage implementation. Link: https://lkml.kernel.org/r/20231215200245.748418-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
af7628d6 |
|
17-Nov-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: convert error_remove_page to error_remove_folio There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
0a88810d |
|
16-Oct-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
buffer: remove folio_create_empty_buffers() With all users converted, remove the old create_empty_buffers() and rename folio_create_empty_buffers() to create_empty_buffers(). Link: https://lkml.kernel.org/r/20231016201114.1928083-28-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
414ae0a4 |
|
16-Oct-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ocfs2: convert ocfs2_map_page_blocks to use a folio Convert the page argument to a folio and then use the folio APIs throughout. Replaces three hidden calls to compound_head() with one explicit one. Link: https://lkml.kernel.org/r/20231016201114.1928083-22-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Pankaj Raghav <p.raghav@samsung.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
fd6acbbc |
|
04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
ocfs2: convert to new timestamp accessors Convert to using the new inode timestamp accessor functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185347.80880-54-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
6861de97 |
|
05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
ocfs2: convert to ctime accessor functions In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-60-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
|
#
0aaf08de |
|
19-Jan-2023 |
Al Viro <viro@zeniv.linux.org.uk> |
__blockdev_direct_IO(): get rid of submit_io callback always NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
90410bcf |
|
02-Mar-2023 |
Jan Kara via Ocfs2-devel <ocfs2-devel@oss.oracle.com> |
ocfs2: fix data corruption after failed write When buffered write fails to copy data into underlying page cache page, ocfs2_write_end_nolock() just zeroes out and dirties the page. This can leave dirty page beyond EOF and if page writeback tries to write this page before write succeeds and expands i_size, page gets into inconsistent state where page dirty bit is clear but buffer dirty bits stay set resulting in page data never getting written and so data copied to the page is lost. Fix the problem by invalidating page beyond EOF after failed write. Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
54d9171d |
|
01-Sep-2022 |
Zhang Yi <yi.zhang@huawei.com> |
ocfs2: replace ll_rw_block() ll_rw_block() is not safe for the sync read path because it cannot guarantee that submitting read IO if the buffer has been locked. We could get false positive EIO after wait_on_buffer() if the buffer has been locked by others. So stop using ll_rw_block() in ocfs2. Link: https://lkml.kernel.org/r/20220901133505.2510834-9-yi.zhang@huawei.com Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
67235182 |
|
06-Jun-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() Use a folio throughout __buffer_migrate_folio(), add kernel-doc for buffer_migrate_folio() and buffer_migrate_folio_norefs(), move their declarations to buffer.h and switch all filesystems that have wired them up. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
9bb88987 |
|
29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ocfs2: Convert ocfs2_read_folio() to use a folio Use the folio API throughout. There are a few places where we convert back to a page to call into the rest of the filesystem, so folio usage needs to be pushed down to those functions later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
1420c4a5 |
|
14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
fs/buffer: Combine two submit_bh() and ll_rw_block() arguments Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Acked-by: Song Liu <song@kernel.org> (for the md changes) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
68189fef |
|
30-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Change try_to_free_buffers() to take a folio All but two of the callers already have a folio; pass a folio into try_to_free_buffers(). This removes the last user of cancel_dirty_page() so remove that wrapper function too. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
eca66389 |
|
30-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ocfs2: Convert to release_folio Use folios throughout the release_folio path. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
|
#
bb9263fc |
|
29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
ocfs2: Convert ocfs2 to read_folio This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
2c69e205 |
|
29-Apr-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Convert block_read_full_page() to block_read_full_folio() This function is NOT converted to handle large folios, so include an assert that the filesystem isn't passing one in. Otherwise, use the folio functions instead of the page functions, where they exist. Convert all filesystems which use block_read_full_page(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
|
#
9d6b0cd7 |
|
22-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Remove flags parameter from aops->write_begin There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
|
#
137cebf9 |
|
22-Mar-2022 |
hongnanli <hongnan.li@linux.alibaba.com> |
fs/ocfs2: fix comments mentioning i_mutex inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix comments still mentioning i_mutex. Link: https://lkml.kernel.org/r/20220214031314.100094-1-hongnan.li@linux.alibaba.com Signed-off-by: hongnanli <hongnan.li@linux.alibaba.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e621900a |
|
09-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Convert __set_page_dirty_buffers to block_dirty_folio Convert all callers; mostly this is just changing the aops to point at it, but a few implementations need a little more work. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
|
#
7ba13abb |
|
09-Feb-2022 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Turn block_invalidatepage into block_invalidate_folio Remove special-casing of a NULL invalidatepage, since there is no more block_invalidatepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
|
#
e07bf00c |
|
14-Jan-2022 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
ocfs2: clearly handle ocfs2_grab_pages_for_write() return value ocfs2_grab_pages_for_write() may return -EAGAIN if write context type is mmap and it could not lock the target page. In this case, we exit with no error and no target page. And then trigger the caller page_mkwrite() to retry. Since there are other caller types, e.g. buffer and direct io, make the return value handling more clear. Link: https://lkml.kernel.org/r/20211206065051.103353-1-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0af57378 |
|
28-Jun-2021 |
Christoph Hellwig <hch@lst.de> |
mm: require ->set_page_dirty to be explicitly wired up Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire that method up for the missing instances. [hch@lst.de: ecryptfs: add a ->set_page_dirty cludge] Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tyler Hicks <code@tyhicks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
01f01399 |
|
28-Jun-2021 |
Wan Jiabing <wanjiabing@vivo.com> |
ocfs2: remove repeated uptodate check for buffer In commit 60f91826ca62 ("buffer: Avoid setting buffer bits that are already set"), function set_buffer_##name was added a test_bit() to check buffer, which is the same as function buffer_##name. The !buffer_uptodate(bh) here is a repeated check. Remove it. Link: https://lkml.kernel.org/r/20210425025702.13628-1-wanjiabing@vivo.com Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fa60ce2c |
|
06-May-2021 |
Masahiro Yamada <masahiroy@kernel.org> |
treewide: remove editor modelines and cruft The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
90bd070a |
|
09-Apr-2021 |
Wengang Wang <wen.gang.wang@oracle.com> |
ocfs2: fix deadlock between setattr and dio_end_io_write The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc #1 schedule at ffffffff81866de6 #2 inode_dio_wait at ffffffff812a2d04 #3 ocfs2_setattr at ffffffffc05f322e [ocfs2] #4 notify_change at ffffffff812a5a09 #5 do_truncate at ffffffff812808f5 #6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 #7 sys_ftruncate at ffffffff81280d8e #8 do_syscall_64 at ffffffff81003949 #9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc #1 schedule+54 at ffffffff81866de6 #2 rwsem_down_write_failed+536 at ffffffff8186aa28 #3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 #4 down_write+45 at ffffffff81869c9d #5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] #6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] #7 dio_complete+140 at ffffffff812c873c #8 dio_aio_complete_work+25 at ffffffff812c89f9 #9 process_one_work+361 at ffffffff810b1889 #10 worker_thread+77 at ffffffff810b233d #11 kthread+261 at ffffffff810b7fd5 #12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit 28f5a8a7c033 ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d4388340 |
|
01-Jun-2020 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: convert mpage_readpages to mpage_readahead Implement the new readahead aop and convert all callers (block_dev, exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6, reiserfs & udf). The callers are all trivial except for GFS2 & OCFS2. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> # ocfs2 Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> # ocfs2 Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Link: http://lkml.kernel.org/r/20200414150233.24495-17-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d055b4fb |
|
25-Sep-2019 |
David Howells <dhowells@redhat.com> |
pipe: Reduce #inclusion of pipe_fs_i.h Remove some #inclusions of linux/pipe_fs_i.h that don't seem to be necessary any more. Signed-off-by: David Howells <dhowells@redhat.com>
|
#
583fee3e |
|
06-Oct-2019 |
Jia-Ju Bai <baijiaju1990@gmail.com> |
fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() In ocfs2_write_end_nolock(), there are an if statement on lines 1976, 2047 and 2058, to check whether handle is NULL: if (handle) When handle is NULL, it is used on line 2045: ocfs2_update_inode_fsync_trans(handle, inode, 1); oi->i_sync_tid = handle->h_transaction->t_tid; Thus, a possible null-pointer dereference may occur. To fix this bug, handle is checked before calling ocfs2_update_inode_fsync_trans(). This bug is found by a static analysis tool STCheck written by us. Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.com Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7a243c82 |
|
06-Oct-2019 |
Jia Guo <guojia12@huawei.com> |
ocfs2: clear zero in unaligned direct IO Unused portion of a part-written fs-block-sized block is not set to zero in unaligned append direct write.This can lead to serious data inconsistencies. Ocfs2 manage disk with cluster size(for example, 1M), part-written in one cluster will change the cluster state from UN-WRITTEN to WRITTEN, VFS(function dio_zero_block) doesn't do the cleaning because bh's state is not set to NEW in function ocfs2_dio_wr_get_block when we write a WRITTEN cluster. For example, the cluster size is 1M, file size is 8k and we direct write from 14k to 15k, then 12k~14k and 15k~16k will contain dirty data. We have to deal with two cases: 1.The starting position of direct write is outside the file. 2.The starting position of direct write is located in the file. We need set bh's state to NEW in the first case. In the second case, we need mapped twice because bh's state of area out file should be set to NEW while area in file not. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.com Signed-off-by: Jia Guo <guojia12@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bbd0f327 |
|
23-Sep-2019 |
Joseph Qi <joseph.qi@linux.alibaba.com> |
ocfs2: use jbd2_inode dirty range scoping 6ba0e7dc64a5 ("jbd2: introduce jbd2_inode dirty range scoping") allow us scoping each of the inode dirty ranges associated with a given transaction, and ext4 already does this way. Now let's also use the newly introduced jbd2_inode dirty range scoping to prevent us from waiting forever when trying to complete a journal transaction in ocfs2. Link: http://lkml.kernel.org/r/1562977611-8412-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Reviewed-by: Changwei Ge <chge@linux.alibaba.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
328970de |
|
23-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 145 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 021110 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 84 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Richard Fontana <rfontana@redhat.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190524100844.756442981@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
f86196ea |
|
03-Jan-2019 |
Nikolay Borisov <nborisov@suse.com> |
fs: don't open code lru_to_page() Multiple filesystems open code lru_to_page(). Rectify this by moving the macro from mm_inline (which is specific to lru stuff) to the more generic mm.h header and start using the macro where appropriate. No functional changes. Link: http://lkml.kernel.org/r/20181129104810.23361-1-nborisov@suse.com Link: https://lkml.kernel.org/r/20181129075301.29087-1-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Pankaj gupta <pagupta@redhat.com> Acked-by: "Yan, Zheng" <zyan@redhat.com> [ceph] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
5040f8df |
|
16-Nov-2018 |
Wengang Wang <wen.gang.wang@oracle.com> |
ocfs2: free up write context when direct IO failed The write context should also be freed even when direct IO failed. Otherwise a memory leak is introduced and entries remain in oi->ip_unwritten_list causing the following BUG later in unlink path: ERROR: bug expression: !list_empty(&oi->ip_unwritten_list) ERROR: Clear inode of 215043, inode has unwritten extents ... Call Trace: ? __set_current_blocked+0x42/0x68 ocfs2_evict_inode+0x91/0x6a0 [ocfs2] ? bit_waitqueue+0x40/0x33 evict+0xdb/0x1af iput+0x1a2/0x1f7 do_unlinkat+0x194/0x28f SyS_unlinkat+0x1b/0x2f do_syscall_64+0x79/0x1ae entry_SYSCALL_64_after_hwframe+0x151/0x0 This patch also logs, with frequency limit, direct IO failures. Link: http://lkml.kernel.org/r/20181102170632.25921-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Changwei Ge <ge.changwei@h3c.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0ae1c2db |
|
26-Oct-2018 |
Ding Xiang <dingxiang@cmss.chinamobile.com> |
ocfs2: remove unneeded null check Null check for kfree is unnecessary, so remove it. Link: http://lkml.kernel.org/r/1535704514-26559-1-git-send-email-dingxiang@cmss.chinamobile.com Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d324cd4c |
|
05-Apr-2018 |
piaojun <piaojun@huawei.com> |
ocfs2: use 'oi' instead of 'OCFS2_I()' We could use 'oi' instead of 'OCFS2_I()' to make code more elegant. Link: http://lkml.kernel.org/r/5A7020FE.5050906@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1119d3c0 |
|
05-Apr-2018 |
piaojun <piaojun@huawei.com> |
ocfs2: use 'osb' instead of 'OCFS2_SB()' We could use 'osb' instead of 'OCFS2_SB()' to make code more elegant. Link: http://lkml.kernel.org/r/5A702111.7090907@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
71a36944 |
|
31-Jan-2018 |
Changwei Ge <ge.changwei@h3c.com> |
ocfs2: try to reuse extent block in dealloc without meta_alloc A crash issue was reported by John Lightsey with a call trace as follows: ocfs2_split_extent+0x1ad3/0x1b40 [ocfs2] ocfs2_change_extent_flag+0x33a/0x470 [ocfs2] ocfs2_mark_extent_written+0x172/0x220 [ocfs2] ocfs2_dio_end_io+0x62d/0x910 [ocfs2] dio_complete+0x19a/0x1a0 do_blockdev_direct_IO+0x19dd/0x1eb0 __blockdev_direct_IO+0x43/0x50 ocfs2_direct_IO+0x8f/0xa0 [ocfs2] generic_file_direct_write+0xb2/0x170 __generic_file_write_iter+0xc3/0x1b0 ocfs2_file_write_iter+0x4bb/0xca0 [ocfs2] __vfs_write+0xae/0xf0 vfs_write+0xb8/0x1b0 SyS_write+0x4f/0xb0 system_call_fastpath+0x16/0x75 The BUG code told that extent tree wants to grow but no metadata was reserved ahead of time. From my investigation into this issue, the root cause it that although enough metadata is not reserved, there should be enough for following use. Rightmost extent is merged into its left one due to a certain times of marking extent written. Because during marking extent written, we got many physically continuous extents. At last, an empty extent showed up and the rightmost path is removed from extent tree. Add a new mechanism to reuse extent block cached in dealloc which were just unlinked from extent tree to solve this crash issue. Criteria is that during marking extents *written*, if extent rotation and merging results in unlinking extent with growing extent tree later without any metadata reserved ahead of time, try to reuse those extents in dealloc in which deleted extents are cached. Also, this patch addresses the issue John reported that ::dw_zero_count is not calculated properly. After applying this patch, the issue John reported was gone. Thanks for the reproducer provided by John. And this patch has passed ocfs2-test(29 cases) suite running by New H3C Group. [ge.changwei@h3c.com: fix static checker warnning] Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F29196AE@H3CMLB12-EX.srv.huawei-3com.com [akpm@linux-foundation.org: brelse(NULL) is legal] Link: http://lkml.kernel.org/r/1515479070-32653-2-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reported-by: John Lightsey <john@nixnuts.net> Tested-by: John Lightsey <john@nixnuts.net> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Mark Fasheh <mfasheh@versity.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
63de8bd9 |
|
31-Jan-2018 |
Changwei Ge <ge.changwei@h3c.com> |
ocfs2: make metadata estimation accurate and clear Current code assume that ::w_unwritten_list always has only one item on. This is not right and hard to get understood. So improve how to count unwritten item. Link: http://lkml.kernel.org/r/1515479070-32653-1-git-send-email-ge.changwei@h3c.com Signed-off-by: Changwei Ge <ge.changwei@h3c.com> Reported-by: John Lightsey <john@nixnuts.net> Tested-by: John Lightsey <john@nixnuts.net> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
3e4c56d4 |
|
15-Nov-2017 |
alex chen <alex.chen@huawei.com> |
ocfs2: ip_alloc_sem should be taken in ocfs2_get_block() ip_alloc_sem should be taken in ocfs2_get_block() when reading file in DIRECT mode to prevent concurrent access to extent tree with ocfs2_dio_end_io_write(), which may cause BUGON in the following situation: read file 'A' end_io of writing file 'A' vfs_read __vfs_read ocfs2_file_read_iter generic_file_read_iter ocfs2_direct_IO __blockdev_direct_IO do_blockdev_direct_IO do_direct_IO get_more_blocks ocfs2_get_block ocfs2_extent_map_get_blocks ocfs2_get_clusters ocfs2_get_clusters_nocache() ocfs2_search_extent_list return the index of record which contains the v_cluster, that is v_cluster > rec[i]->e_cpos. ocfs2_dio_end_io ocfs2_dio_end_io_write down_write(&oi->ip_alloc_sem); ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ... --> modify the rec[i]->e_cpos, resulting in v_cluster < rec[i]->e_cpos. BUG_ON(v_cluster < le32_to_cpu(rec->e_cpos)) [alex.chen@huawei.com: v3] Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Link: http://lkml.kernel.org/r/59EF3614.6050008@huawei.com Fixes: c15471f79506 ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Reviewed-by: Gang He <ghe@suse.com> Acked-by: Changwei Ge <ge.changwei@h3c.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
93407472 |
|
27-Feb-2017 |
Fabian Frederick <fabf@skynet.be> |
fs: add i_blocksize() Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs branch. This patch also fixes multiple checkpatch warnings: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Thanks to Andrew Morton for suggesting more appropriate function instead of macro. [geliangtang@gmail.com: truncate: use i_blocksize()] Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Geliang Tang <geliangtang@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
07f38d97 |
|
12-Dec-2016 |
piaojun <piaojun@huawei.com> |
ocfs2: clean up unused 'page' parameter in ocfs2_write_end_nolock() 'page' parameter in ocfs2_write_end_nolock() is never used. Link: http://lkml.kernel.org/r/582FD91A.5000902@huawei.com Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <jiangqi903@gmail.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
aef73a61 |
|
09-Dec-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: fix bad pointer cast generic/188 triggered a dmesg stack trace because the dio completion was casting a buffer head to an on-disk inode, which is whacky. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
dbf896fc |
|
01-Dec-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: always unlock when completing dio writes Always unlock the inode when completing dio writes, even if an error has occurrred. The caller already checks the inode and unlocks it if needed, so we might as well reduce contention. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
08554955 |
|
09-Nov-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: don't eat io errors during _dio_end_io_write ocfs2_dio_end_io_write eats whatever errors may happen, which means that write errors do not propagate to userspace. Fix that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
06a70305 |
|
09-Nov-2016 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: prohibit refcounted swapfiles The swapfile mechanism calls bmap once to find all the swap file mappings, which means that we cannot properly support CoW remapping. Therefore, error out if the swap code tries to call bmap on a refcounted file. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
#
e64855c6 |
|
04-Nov-2016 |
Jan Kara <jack@suse.cz> |
fs: Add helper to clean bdev aliases under a bh and use it Add a helper function that clears buffer heads from a block device aliasing passed bh. Use this helper function from filesystems instead of the original unmap_underlying_metadata() to save some boiler plate code and also have a better name for the functionalily since it is not unmapping anything for a *long* time. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
c33f0785 |
|
30-Sep-2016 |
Eric Ren <zren@suse.com> |
ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the following operations respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then ftruncate(fd, CLUSTER_SIZE) again and again. This is the backtrace when the deadlock happens: __wait_on_bit_lock+0x50/0xa0 __lock_page+0xb7/0xc0 ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] do_page_mkwrite+0x66/0xc0 handle_mm_fault+0x685/0x1350 __do_page_fault+0x1d8/0x4d0 trace_do_page_fault+0x37/0xf0 do_async_page_fault+0x19/0x70 async_page_fault+0x28/0x30 In ocfs2_write_begin_nolock(), we first grab the pages and then allocate disk space for this write; ocfs2_try_to_free_truncate_log() will be called if -ENOSPC is returned; if we're lucky to get enough clusters, which is usually the case, we start over again. But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write(). Another deadlock will happen in __do_page_mkwrite() if ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a locked target page. These two errors fail on the same path, so fix them by unlocking the target page manually before ocfs2_free_write_ctxt(). Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Changes since v1: 1. Also put ENOMEM error case into consideration. Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: He Gang <ghe@suse.com> Acked-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
078cd827 |
|
14-Sep-2016 |
Deepa Dinamani <deepa.kernel@gmail.com> |
fs: Replace CURRENT_TIME with current_time() for inode timestamps CURRENT_TIME macro is not appropriate for filesystems as it doesn't use the right granularity for filesystem timestamps. Use current_time() instead. CURRENT_TIME is also not y2038 safe. This is also in preparation for the patch that transitions vfs timestamps to use 64 bit time and hence make them y2038 safe. As part of the effort current_time() will be extended to do range checks. Hence, it is necessary for all file system timestamps to use current_time(). Also, current_time() will be transitioned along with vfs to be y2038 safe. Note that whenever a single call to current_time() is used to change timestamps in different inodes, it is because they share the same time granularity. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felipe Balbi <balbi@kernel.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Acked-by: David Sterba <dsterba@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2070ad1a |
|
02-Aug-2016 |
Eric Ren <zren@suse.com> |
ocfs2: retry on ENOSPC if sufficient space in truncate log The testcase "mmaptruncate" in ocfs2 test suite always fails with ENOSPC error on small volume (say less than 10G). This testcase repeatedly performs "extend" and "truncate" on a file. Continuously, it truncates the file to 1/2 of the size, and then extends to 100% of the size. The main bitmap will quickly run out of space because the "truncate" code prevent truncate log from being flushed by ocfs2_schedule_truncate_log_flush(osb, 1), while truncate log may have cached lots of clusters. So retry to allocate after flushing truncate log when ENOSPC is returned. And we cannot reuse the deleted blocks before the transaction committed. Fortunately, we already have a function to do this - ocfs2_try_to_free_truncate_log(). Just need to remove the "static" modifier and put it into the right place. The "unlock"/"lock" code isn't elegant, but there seems to be no better option. [zren@suse.com: locking fix] Link: http://lkml.kernel.org/r/1468031546-4797-1-git-send-email-zren@suse.com Link: http://lkml.kernel.org/r/1466586469-5541-1-git-send-email-zren@suse.com Signed-off-by: Eric Ren <zren@suse.com> Reviewed-by: Gang He <ghe@suse.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
dfec8a14 |
|
05-Jun-2016 |
Mike Christie <mchristi@redhat.com> |
fs: have ll_rw_block users pass in op and flags separately This has ll_rw_block users pass in the operation and flags separately, so ll_rw_block can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
|
#
93c76a3d |
|
04-Dec-2015 |
Al Viro <viro@zeniv.linux.org.uk> |
file_inode(f)->i_mapping is f->f_mapping Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7b9743eb |
|
11-Apr-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
ocfs2: don't open-code inode_lock/inode_unlock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c8b8e32d |
|
07-Apr-2016 |
Christoph Hellwig <hch@lst.de> |
direct-io: eliminate the offset argument to ->direct_IO Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually work, so eliminate the superflous argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
ea1754a0 |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
09cbfeaf |
|
01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
28888681 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: fix a deadlock issue in ocfs2_dio_end_io_write() The code should call ocfs2_free_alloc_context() to free meta_ac & data_ac before calling ocfs2_run_deallocs(). Because ocfs2_run_deallocs() will acquire the system inode's i_mutex hold by meta_ac. So try to release the lock before ocfs2_run_deallocs(). Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.") Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Acked-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ce170828 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: fix disk file size and memory file size mismatch When doing append direct write in an already allocated cluster, and fast path in ocfs2_dio_get_block() is triggered, function ocfs2_dio_end_io_write() will be skipped as there is no context allocated. As a result, the disk file size will not be changed as it should be. The solution is to skip fast path when we are about to change file size. Fixes: af1310367f41 ("ocfs2: fix sparse file & data ordering issue in direct io.") Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Acked-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a86a72a4 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: take ip_alloc_sem in ocfs2_dio_get_block & ocfs2_dio_end_io_write Take ip_alloc_sem to prevent concurrent access to extent tree, which may cause the extent tree in an unstable state. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e63890f3 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: fix ip_unaligned_aio deadlock with dio work queue In the current implementation of unaligned aio+dio, lock order behave as follow: in user process context: -> call io_submit() -> get i_mutex <== window1 -> get ip_unaligned_aio -> submit direct io to block device -> release i_mutex -> io_submit() return in dio work queue context(the work queue is created in __blockdev_direct_IO): -> release ip_unaligned_aio <== window2 -> get i_mutex -> clear unwritten flag & change i_size -> release i_mutex There is a limitation to the thread number of dio work queue. 256 at default. If all 256 thread are in the above 'window2' stage, and there is a user process in the 'window1' stage, the system will became deadlock. Since the user process hold i_mutex to wait ip_unaligned_aio lock, while there is a direct bio hold ip_unaligned_aio mutex who is waiting for a dio work queue thread to be schedule. But all the dio work queue thread is waiting for i_mutex lock in 'window2'. This case only happened in a test which send a large number(more than 256) of aio at one io_submit() call. My design is to remove ip_unaligned_aio lock. Change it to a sync io instead. Just like ip_unaligned_aio lock, serialize the unaligned aio dio. [akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi] Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c15471f7 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: fix sparse file & data ordering issue in direct io There are mainly three issues in the direct io code path after commit 24c40b329e03 ("ocfs2: implement ocfs2_direct_IO_write"): * Does not support sparse file. * Does not support data ordering. eg: when write to a file hole, it will alloc extent first. If system crashed before io finished, data will corrupt. * Potential risk when doing aio+dio. The -EIOCBQUEUED return value is likely to be ignored by ocfs2_direct_IO_write(). To resolve above problems, re-design direct io code with following ideas: * Use buffer io to fill in holes. And this will make better performance also. * Clear unwritten after direct write finished. So we can make sure meta data changes after data write to disk. (Unwritten extent is invisible to user, from user's view, meta data is not changed when allocate an unwritten extent.) * Clear ocfs2_direct_IO_write(). Do all ending work in end_io. This patch has passed fs,dio,ltp-aiodio.part1,ltp-aiodio.part2,ltp-aiodio.part4 test cases of ltp. For performance improvement, see following test result: ocfs2 cluster size 1MB, ocfs2 volume is mounted on /mnt/. The original way: + rm /mnt/test.img -f + dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 1707.83 s, 2.5 MB/s + rm /mnt/test.img -f + dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct 16384+0 records in 16384+0 records out 4294967296 bytes (4.3 GB) copied, 582.705 s, 7.4 MB/s After this patch: + rm /mnt/test.img -f + dd if=/dev/zero of=/mnt/test.img bs=4K count=1048576 oflag=direct 1048576+0 records in 1048576+0 records out 4294967296 bytes (4.3 GB) copied, 64.6412 s, 66.4 MB/s + rm /mnt/test.img -f + dd if=/dev/zero of=/mnt/test.img bs=256K count=16384 oflag=direct 16384+0 records in 16384+0 records out 4294967296 bytes (4.3 GB) copied, 34.7611 s, 124 MB/s Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4506cfb6 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: record UNWRITTEN extents when populate write desc To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock. There is still one issue in the direct write procedure. phase 1: alloc extent with UNWRITTEN flag phase 2: submit direct data to disk, add zero page to page cache phase 3: clear UNWRITTEN flag when data has been written to disk When there are 2 direct write A(0~3KB),B(4~7KB) writing to the same cluster 0~7KB (cluster size 8KB). Write request A arrive phase 2 first, it will zero the region (4~7KB). Before request A enter to phase 3, request B arrive phase 2, it will zero region (0~3KB). This is just like request B steps request A. To resolve this issue, we should let request B knows this cluster is already under zero, to prevent it from steps the previous write request. This patch will add function ocfs2_unwritten_check() to do this job. It will record all clusters that are under direct write(it will be recorded in the 'ip_unwritten_list' member of inode info), and prevent the later direct write writing to the same cluster to do the zero work again. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
2de6a3c7 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: return the physical address in ocfs2_write_cluster To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock. Direct io needs to get the physical address from write_begin, to map the user page. This patch is to change the arg 'phys' of ocfs2_write_cluster to a pointer, so it can be retrieved to write_begin. And we can retrieve it to the direct io procedure. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
46e62556 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: do not change i_size in write_end for direct io To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock. Append direct io do not change i_size in get block phase. It only move to orphan when starting write. After data is written to disk, it will delete itself from orphan and update i_size. So skip i_size change section in write_begin for direct io. And when there is no extents alloc, no meta data changes needed for direct io (since write_begin start trans for 2 reason: alloc extents & change i_size. Now none of them needed). So we can skip start trans procedure. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
65c4db8c |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: test target page before change it To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock. Direct io data will not appear in buffer. The w_target_page member will not be filled by direct io. So avoid to use it when it's NULL. Unlinke buffer io and mmap, direct io will call write_begin with more than 1 page a time. So the target_index is not sufficient to describe the actual data. change it to a range start at target_index, end in end_index. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b46637d5 |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: use c_new to indicate newly allocated extents To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock. There is a problem in ocfs2's direct io implement: if system crashed after extents allocated, and before data return, we will get a extent with dirty data on disk. This problem violate the journal=order semantics, which means meta changes take effect after data written to disk. To resolve this issue, direct write can use the UNWRITTEN flag to describe a extent during direct data writeback. The direct write procedure should act in the following order: phase 1: alloc extent with UNWRITTEN flag phase 2: submit direct data to disk, add zero page to page cache phase 3: clear UNWRITTEN flag when data has been written to disk This patch is to change the 'c_unwritten' member of ocfs2_write_cluster_desc to 'c_clear_unwritten'. Means whether to clear the unwritten flag. It do not care if a extent is allocated or not. And use 'c_new' to specify a newly allocated extent. So the direct io procedure can use c_clear_unwritten to control the UNWRITTEN bit on extent. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c1ad1e3c |
|
25-Mar-2016 |
Ryan Ding <ryan.ding@oracle.com> |
ocfs2: add ocfs2_write_type_t type to identify the caller of write Patchset: fix ocfs2 direct io code patch to support sparse file and data ordering semantics The idea is to use buffer io(more precisely use the interface ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work beyond block size. And clear UNWRITTEN flag until direct io data has been written to disk, which can prevent data corruption when system crashed during direct write. And we will also archive a better performance: eg. dd direct write new file with block size 4KB: before this patchset: 2.5 MB/s after this patchset: 66.4 MB/s This patch (of 8): To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock. Remove unused args filp & flags. Add new arg type. The type is one of buffer/direct/mmap. Indicate 3 way to perform write. buffer/mmap type has implemented. direct type will be implemented later. Signed-off-by: Ryan Ding <ryan.ding@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a4a8481f |
|
26-Feb-2016 |
Guozhonghua <guozhonghua@h3c.com> |
ocfs2: unlock inode if deleting inode from orphan fails When doing append direct io cleanup, if deleting inode fails, it goes out without unlocking inode, which will cause the inode deadlock. This issue was introduced by commit cf1776a9e834 ("ocfs2: fix a tiny race when truncate dio orohaned entry"). Signed-off-by: Guozhonghua <guozhonghua@h3c.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
187372a3 |
|
07-Feb-2016 |
Christoph Hellwig <hch@lst.de> |
direct-io: always call ->end_io if non-NULL This way we can pass back errors to the file system, and allow for cleanup required for all direct I/O invocations. Also allow the ->end_io handlers to return errors on their own, so that I/O completion errors can be passed on to the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
|
#
5955102c |
|
22-Jan-2016 |
Al Viro <viro@zeniv.linux.org.uk> |
wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4e357b93 |
|
05-Nov-2015 |
jiangyiwen <jiangyiwen@huawei.com> |
ocfs2: fill in the unused portion of the block with zeros by dio_zero_block() A simplified test case is (this case from Ryan): 1) dd if=/dev/zero of=/mnt/hello bs=512 count=1 oflag=direct; 2) truncate /mnt/hello -s 2097152 file 'hello' is not exist before test. After this command, file 'hello' should be all zero. But 512~4096 is some random data. Setting bh state to new when get a new block, if so, direct_io_worker()->dio_zero_block() will fill-in the unused portion of the block with zero. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d162eaad |
|
05-Nov-2015 |
Norton.Zhu <norton.zhu@huawei.com> |
ocfs2_direct_IO_write() misses ocfs2_is_overwrite() error code If ocfs2_is_overwrite failed, ocfs2_direct_IO_write mays till return success to the caller. Signed-off-by: Norton.Zhu <norton.zhu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7ecef14a |
|
04-Sep-2015 |
Joe Perches <joe@perches.com> |
ocfs2: neaten do_error, ocfs2_error and ocfs2_abort These uses sometimes do and sometimes don't have '\n' terminations. Make the uses consistently use '\n' terminations and remove the newline from the functions. Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7f27ec97 |
|
04-Sep-2015 |
yangwenfang <vicky.yangwenfang@huawei.com> |
ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock() 1: After we call ocfs2_journal_access_di() in ocfs2_write_begin(), jbd2_journal_restart() may also be called, in this function transaction A's t_updates-- and obtains a new transaction B. If jbd2_journal_commit_transaction() is happened to commit transaction A, when t_updates==0, it will continue to complete commit and unfile buffer. So when jbd2_journal_dirty_metadata(), the handle is pointed a new transaction B, and the buffer head's journal head is already freed, jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns EINVAL, So it triggers the BUG_ON(status). thread 1 jbd2 ocfs2_write_begin jbd2_journal_commit_transaction ocfs2_write_begin_nolock ocfs2_start_trans jbd2__journal_start(t_updates+1, transaction A) ocfs2_journal_access_di ocfs2_write_cluster_by_desc ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_extend_rotate_transaction jbd2_journal_restart (t_updates-1,transaction B) t_updates==0 __jbd2_journal_refile_buffer (jh->b_transaction = NULL) ocfs2_write_end ocfs2_write_end_nolock ocfs2_journal_dirty jbd2_journal_dirty_metadata(bug) ocfs2_commit_trans 2. In ext4, I found that: jbd2_journal_get_write_access() called by ext4_write_end. ext4_write_begin ext4_journal_start __ext4_journal_start_sb ext4_journal_check_start jbd2__journal_start ext4_write_end ext4_mark_inode_dirty ext4_reserve_inode_write ext4_journal_get_write_access jbd2_journal_get_write_access ext4_mark_iloc_dirty ext4_do_update_inode ext4_handle_dirty_metadata jbd2_journal_dirty_metadata 3. So I think we should put ocfs2_journal_access_di before ocfs2_journal_dirty in the ocfs2_write_end. and it works well after my modification. Signed-off-by: vicky <vicky.yangwenfang@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Zhangguanghui <zhang.guanghui@h3c.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6ab855a9 |
|
04-Sep-2015 |
WeiWei Wang <wangww631@huawei.com> |
ocfs2: add ip_alloc_sem in direct IO to protect allocation changes In ocfs2, ip_alloc_sem is used to protect allocation changes on the node. In direct IO, we add ip_alloc_sem to protect date consistent between direct-io and ocfs2_truncate_file race (buffer io use ip_alloc_sem already). Although inode->i_mutex lock is used to avoid concurrency of above situation, i think ip_alloc_sem is still needed because protect allocation changes is significant. Other filesystem like ext4 also uses rw_semaphore to protect data consistent between get_block-vs-truncate race by other means, So ip_alloc_sem in ocfs2 direct io is needed. Signed-off-by: Weiwei Wang <wangww631@huawei.com> Signed-off-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
faaebf18 |
|
04-Sep-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix several issues of append dio 1) Take rw EX lock in case of append dio. 2) Explicitly treat the error code -EIOCBQUEUED as normal. 3) Set di_bh to NULL after brelse if it may be used again later. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
512f62ac |
|
04-Sep-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix race between dio and recover orphan During direct io the inode will be added to orphan first and then deleted from orphan. There is a race window that the orphan entry will be deleted twice and thus trigger the BUG when validating OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan. ocfs2_direct_IO_write ... ocfs2_add_inode_to_orphan >>>>>>>> race window. 1) another node may rm the file and then down, this node take care of orphan recovery and clear flag OCFS2_DIO_ORPHANED_FL. 2) since rw lock is unlocked, it may race with another orphan recovery and append dio. ocfs2_del_inode_from_orphan So take inode mutex lock when recovering orphans and make rw unlock at the end of aio write in case of append dio. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
32e5a2a2 |
|
06-Aug-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix shift left overflow When using a large volume, for example 9T volume with 2T already used, frequent creation of small files with O_DIRECT when the IO is not cluster aligned may clear sectors in the wrong place. This will cause filesystem corruption. This is because p_cpos is a u32. When calculating the corresponding sector it should be converted to u64 first, otherwise it may overflow. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ae1f0814 |
|
24-Jun-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix wrong check in ocfs2_direct_IO_get_blocks contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared with clusters_to_alloc. So convert it to clusters first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fa5a0eb3 |
|
24-Jun-2015 |
WeiWei Wang <wangww631@huawei.com> |
ocfs2: remove OCFS2_IOCB_SEM lock type in direct io In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect inode->i_alloc_sem rw semaphore lock in the earlier kernel version. However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not used at all, so OCFS2_IOCB_SEM lock type needs to be removed. Signed-off-by: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
cf1776a9 |
|
24-Jun-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix a tiny race when truncate dio orohaned entry Once dio crashed it will leave an entry in orphan dir. And orphan scan will take care of the clean up. There is a tiny race case that the same entry will be truncated twice and then trigger the BUG in ocfs2_del_inode_from_orphan. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
14a5275d |
|
14-Apr-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: do not use ocfs2_zero_extend during direct IO In ocfs2_direct_IO_write, we use ocfs2_zero_extend to zero allocated clusters in case of cluster not aligned. But ocfs2_zero_extend uses page cache, this may happen that it clears the data which blockdev_direct_IO has already written. We should use blkdev_issue_zeroout instead of ocfs2_zero_extend during direct IO. So fix this issue by introducing ocfs2_direct_IO_zero_extend and ocfs2_direct_IO_extend_no_holes. Reported-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Tested-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
37a8d89a |
|
14-Apr-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: take inode lock when get clusters We need take inode lock when calling ocfs2_get_clusters. And use GFP_NOFS instead of GFP_KERNEL. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7e9b1955 |
|
14-Apr-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: no need get dinode bh when zeroing extend Since di_bh won't be used when zeroing extend, set it to NULL. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
bdd86215 |
|
14-Apr-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: fix a typing error in ocfs2_direct_IO_write Only when direct IO succeeds we need consider zeroing out in case of cluster not aligned. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
22c6186e |
|
16-Mar-2015 |
Omar Sandoval <osandov@osandov.com> |
direct_IO: remove rw from a_ops->direct_IO() Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
6f673763 |
|
16-Mar-2015 |
Omar Sandoval <osandov@osandov.com> |
direct_IO: use iov_iter_rw() instead of rw everywhere The rw parameter to direct_IO is redundant with iov_iter->type, and treated slightly differently just about everywhere it's used: some users do rw & WRITE, and others do rw == WRITE where they should be doing a bitwise check. Simplify this with the new iov_iter_rw() helper, which always returns either READ or WRITE. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
17f8c842 |
|
16-Mar-2015 |
Omar Sandoval <osandov@osandov.com> |
Remove rw from {,__,do_}blockdev_direct_IO() Most filesystems call through to these at some point, so we'll start here. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
e2e40f2c |
|
22-Feb-2015 |
Christoph Hellwig <hch@lst.de> |
fs: move struct kiocb to fs.h struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
49255dce |
|
16-Feb-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks Allow blocks allocation in ocfs2_direct_IO_get_blocks. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
24c40b32 |
|
16-Feb-2015 |
Joseph Qi <joseph.qi@huawei.com> |
ocfs2: implement ocfs2_direct_IO_write Implement ocfs2_direct_IO_write. Add the inode to orphan dir first, and then delete it once append O_DIRECT finished. This is to make sure block allocation and inode size are consistent. [akpm@linux-foundation.org: fix it for "block: Add discard flag to blkdev_issue_zeroout() function"] Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Weiwei Wang <wangww631@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Xuejiufei <xuejiufei@huawei.com> Cc: alex chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
136f49b9 |
|
18-Dec-2014 |
Junxiao Bi <junxiao.bi@oracle.com> |
ocfs2: fix journal commit deadlock For buffer write, page lock will be got in write_begin and released in write_end, in ocfs2_write_end_nolock(), before it unlock the page in ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask for the read lock of journal->j_trans_barrier. Holding page lock and ask for journal->j_trans_barrier breaks the locking order. This will cause a deadlock with journal commit threads, ocfs2cmt will get write lock of journal->j_trans_barrier first, then it wakes up kjournald2 to do the commit work, at last it waits until done. To commit journal, kjournald2 needs flushing data first, it needs get the cache page lock. Since some ocfs2 cluster locks are holding by write process, this deadlock may hung the whole cluster. unlock pages before ocfs2_run_deallocs() can fix the locking order, also put unlock before ocfs2_commit_trans() to make page lock is unlocked before j_trans_barrier to preserve unlocking order. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: <stable@vger.kernel.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
61fb9ea4 |
|
10-Dec-2014 |
jiangyiwen <jiangyiwen@huawei.com> |
ocfs2: do not set filesystem readonly if link down Do not set the filesystem readonly if the storage link is down. In this case, metadata is not corrupted and only -EIO is returned. And if it is indeed corrupted metadata, it has already called ocfs2_error() in ocfs2_validate_inode_block(). Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f775da2f |
|
09-Oct-2014 |
Junxiao Bi <junxiao.bi@oracle.com> |
ocfs2: fix deadlock due to wrong locking order For commit ocfs2 journal, ocfs2 journal thread will acquire the mutex osb->journal->j_trans_barrier and wake up jbd2 commit thread, then it will wait until jbd2 commit thread done. In order journal mode, jbd2 needs flushing dirty data pages first, and this needs get page lock. So osb->journal->j_trans_barrier should be got before page lock. But ocfs2_write_zero_page() and ocfs2_write_begin_inline() obey this locking order, and this will cause deadlock and hung the whole cluster. One deadlock catched is the following: PID: 13449 TASK: ffff8802e2f08180 CPU: 31 COMMAND: "oracle" #0 [ffff8802ee3f79b0] __schedule at ffffffff8150a524 #1 [ffff8802ee3f7a58] schedule at ffffffff8150acbf #2 [ffff8802ee3f7a68] rwsem_down_failed_common at ffffffff8150cb85 #3 [ffff8802ee3f7ad8] rwsem_down_read_failed at ffffffff8150cc55 #4 [ffff8802ee3f7ae8] call_rwsem_down_read_failed at ffffffff812617a4 #5 [ffff8802ee3f7b50] ocfs2_start_trans at ffffffffa0498919 [ocfs2] #6 [ffff8802ee3f7ba0] ocfs2_zero_start_ordered_transaction at ffffffffa048b2b8 [ocfs2] #7 [ffff8802ee3f7bf0] ocfs2_write_zero_page at ffffffffa048e9bd [ocfs2] #8 [ffff8802ee3f7c80] ocfs2_zero_extend_range at ffffffffa048ec83 [ocfs2] #9 [ffff8802ee3f7ce0] ocfs2_zero_extend at ffffffffa048edfd [ocfs2] #10 [ffff8802ee3f7d50] ocfs2_extend_file at ffffffffa049079e [ocfs2] #11 [ffff8802ee3f7da0] ocfs2_setattr at ffffffffa04910ed [ocfs2] #12 [ffff8802ee3f7e70] notify_change at ffffffff81187d29 #13 [ffff8802ee3f7ee0] do_truncate at ffffffff8116bbc1 #14 [ffff8802ee3f7f50] sys_ftruncate at ffffffff8116bcbd #15 [ffff8802ee3f7f80] system_call_fastpath at ffffffff81515142 RIP: 00007f8de750c6f7 RSP: 00007fffe786e478 RFLAGS: 00000206 RAX: 000000000000004d RBX: ffffffff81515142 RCX: 0000000000000000 RDX: 0000000000000200 RSI: 0000000000028400 RDI: 000000000000000d RBP: 00007fffe786e040 R8: 0000000000000000 R9: 000000000000000d R10: 0000000000000000 R11: 0000000000000206 R12: 000000000000000d R13: 00007fffe786e710 R14: 00007f8de70f8340 R15: 0000000000028400 ORIG_RAX: 000000000000004d CS: 0033 SS: 002b crash64> bt PID: 7610 TASK: ffff88100fd56140 CPU: 1 COMMAND: "ocfs2cmt" #0 [ffff88100f4d1c50] __schedule at ffffffff8150a524 #1 [ffff88100f4d1cf8] schedule at ffffffff8150acbf #2 [ffff88100f4d1d08] jbd2_log_wait_commit at ffffffffa01274fd [jbd2] #3 [ffff88100f4d1d98] jbd2_journal_flush at ffffffffa01280b4 [jbd2] #4 [ffff88100f4d1dd8] ocfs2_commit_cache at ffffffffa0499b14 [ocfs2] #5 [ffff88100f4d1e38] ocfs2_commit_thread at ffffffffa0499d38 [ocfs2] #6 [ffff88100f4d1ee8] kthread at ffffffff81090db6 #7 [ffff88100f4d1f48] kernel_thread_helper at ffffffff81516284 crash64> bt PID: 7609 TASK: ffff88100f2d4480 CPU: 0 COMMAND: "jbd2/dm-20-86" #0 [ffff88100def3920] __schedule at ffffffff8150a524 #1 [ffff88100def39c8] schedule at ffffffff8150acbf #2 [ffff88100def39d8] io_schedule at ffffffff8150ad6c #3 [ffff88100def39f8] sleep_on_page at ffffffff8111069e #4 [ffff88100def3a08] __wait_on_bit_lock at ffffffff8150b30a #5 [ffff88100def3a58] __lock_page at ffffffff81110687 #6 [ffff88100def3ab8] write_cache_pages at ffffffff8111b752 #7 [ffff88100def3be8] generic_writepages at ffffffff8111b901 #8 [ffff88100def3c48] journal_submit_data_buffers at ffffffffa0120f67 [jbd2] #9 [ffff88100def3cf8] jbd2_journal_commit_transaction at ffffffffa0121372[jbd2] #10 [ffff88100def3e68] kjournald2 at ffffffffa0127a86 [jbd2] #11 [ffff88100def3ee8] kthread at ffffffff81090db6 #12 [ffff88100def3f48] kernel_thread_helper at ffffffff81516284 Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Alex Chen <alex.chen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
31b14039 |
|
04-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
switch {__,}blockdev_direct_IO() to iov_iter Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
d8d3d94b |
|
04-Mar-2014 |
Al Viro <viro@zeniv.linux.org.uk> |
pass iov_iter to ->direct_IO() unmodified, for now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
2931cdcb |
|
03-Apr-2014 |
Darrick J. Wong <darrick.wong@oracle.com> |
ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file Currently, ocfs2_sync_file grabs i_mutex and forces the current journal transaction to complete. This isn't terribly efficient, since sync_file really only needs to wait for the last transaction involving that inode to complete, and this doesn't require i_mutex. Therefore, implement the necessary bits to track the newest tid associated with an inode, and teach sync_file to wait for that instead of waiting for everything in the journal to commit. Furthermore, only issue the flush request to the drive if jbd2 hasn't already done so. This also eliminates the deadlock between ocfs2_file_aio_write() and ocfs2_sync_file(). aio_write takes i_mutex then calls ocfs2_aiodio_wait() to wait for unaligned dio writes to finish. However, if that dio completion involves calling fsync, then we can get into trouble when some ocfs2_sync_file tries to take i_mutex. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c18ceab0 |
|
03-Apr-2014 |
Wengang Wang <wen.gang.wang@oracle.com> |
ocfs2: change ip_unaligned_aio to of type mutex from atomit_t There is a problem that waitqueue_active() may check stale data thus miss a wakeup of threads waiting on ip_unaligned_aio. The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to be of type mutex thus the above prolem is avoid. Another benifit is that mutex which works as FIFO is fairer than wake_up_all(). Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
41ecc345 |
|
12-Nov-2013 |
Jan Kara <jack@suse.cz> |
ocfs2: simplify ocfs2_invalidatepage() and ocfs2_releasepage() Ocfs2 doesn't do data journalling. Thus its ->invalidatepage and ->releasepage functions never get called on buffers that have journal heads attached. So just use standard variants of functions from buffer.c. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b1214e47 |
|
12-Nov-2013 |
Xue jiufei <xuejiufei@huawei.com> |
ocfs2: fix possible double free in ocfs2_write_begin_nolock When ocfs2_write_cluster_by_desc() failed in ocfs2_write_begin_nolock() because of ENOSPC, it goes to out_quota, freeing data_ac(meta_ac). Then it calls ocfs2_try_to_free_truncate_log() to free space. If enough space freed, it will try to write again. Unfortunately, some error happenes before ocfs2_lock_allocators(), it goes to out and free data_ac(meta_ac) again. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7391a294 |
|
12-Nov-2013 |
Rui Xiang <rui.xiang@huawei.com> |
ocfs2: return ENOMEM when sb_getblk() fails The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So return ENOMEM instead when it fails. [joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
06f9da6e |
|
12-Nov-2013 |
Goldwyn Rodrigues <rgoldwyn@suse.de> |
fs/ocfs2: remove unnecessary variable bits_wanted from ocfs2_calc_extend_credits Code cleanup to remove unnecessary variable passed but never used to ocfs2_calc_extend_credits. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f17c20dd |
|
11-Sep-2013 |
Junxiao Bi <junxiao.bi@oracle.com> |
ocfs2: use i_size_read() to access i_size Though ocfs2 uses inode->i_mutex to protect i_size, there are both i_size_read/write() and direct accesses. Clean up all direct access to eliminate confusion. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7b7a8665 |
|
04-Sep-2013 |
Christoph Hellwig <hch@infradead.org> |
direct-io: Implement generic deferred AIO completions Add support to the core direct-io code to defer AIO completions to user context using a workqueue. This replaces opencoded and less efficient code in XFS and ext4 (we save a memory allocation for each direct IO) and will be needed to properly support O_(D)SYNC for AIO. The communication between the filesystem and the direct I/O code requires a new buffer head flag, which is a bit ugly but not avoidable until the direct I/O code stops abusing the buffer_head structure for communicating with the filesystems. Currently this creates a per-superblock unbound workqueue for these completions, which is taken from an earlier patch by Jan Kara. I'm not really convinced about this use and would prefer a "normal" global workqueue with a high concurrency limit, but this needs further discussion. JK: Fixed ext4 part, dynamic allocation of the workqueue. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
c7dd3392 |
|
13-Aug-2013 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as the struct file pointer, it finally result in a null pointer dereference in ocfs2_duplicate_clusters_by_page. This patch replace file pointer with inode pointer in cow_duplicate_clusters to fix this issue. [jeff.liu@oracle.com: rebased patch against linux-next tree] Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Tao Ma <tm@tao.ma> Tested-by: David Weber <wb@munzinger.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
e5f8d30d |
|
21-May-2013 |
Lukas Czerner <lczerner@redhat.com> |
ocfs2: use ->invalidatepage() length argument ->invalidatepage() aop now accepts range to invalidate so we can make use of it in ocfs2_invalidatepage(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Joel Becker <jlbec@evilplan.org>
|
#
259709b0 |
|
21-May-2013 |
Lukas Czerner <lczerner@redhat.com> |
jbd2: change jbd2_journal_invalidatepage to accept length invalidatepage now accepts range to invalidate and there are two file system using jbd2 also implementing punch hole feature which can benefit from this. We need to implement the same thing for jbd2 layer in order to allow those file system take benefit of this functionality. This commit adds length argument to the jbd2_journal_invalidatepage() and updates all instances in ext4 and ocfs2. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
|
#
d47992f8 |
|
21-May-2013 |
Lukas Czerner <lczerner@redhat.com> |
mm: change invalidatepage prototype to accept length Currently there is no way to truncate partial page where the end truncate point is not at the end of the page. This is because it was not needed and the functionality was enough for file system truncate operation to work properly. However more file systems now support punch hole feature and it can benefit from mm supporting truncating page just up to the certain point. Specifically, with this functionality truncate_inode_pages_range() can be changed so it supports truncating partial page at the end of the range (currently it will BUG_ON() if 'end' is not at the end of the page). This commit changes the invalidatepage() address space operation prototype to accept range to be invalidated and update all the instances for it. We also change the block_invalidatepage() in the same way and actually make a use of the new length argument implementing range invalidation. Actual file system implementations will follow except the file systems where the changes are really simple and should not change the behaviour in any way .Implementation for truncate_page_range() which will be able to accept page unaligned ranges will follow as well. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com>
|
#
9b171e0c |
|
19-Feb-2013 |
Jan Kara <jack@suse.cz> |
ocfs2: fix possible use-after-free with AIO Running AIO is pinning inode in memory using file reference. Once AIO is completed using aio_complete(), file reference is put and inode can be freed from memory. So we have to be sure that calling aio_complete() is the last thing we do with the inode. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
496ad9aa |
|
23-Jan-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
new helper: file_inode(file) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
1269529b |
|
21-Feb-2013 |
Jan Kara <jack@suse.cz> |
ocfs2: wait for page writeback to provide stable pages When stable pages are required, we have to wait if the page is just going to disk and we want to modify it. Add proper callback to ocfs2_grab_pages_for_write(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c4bc8dcb |
|
25-Nov-2011 |
Cong Wang <amwang@redhat.com> |
ocfs2: remove the second argument of k[un]map_atomic() Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Cong Wang <amwang@redhat.com>
|
#
c7e25e6e |
|
23-Jun-2011 |
Jan Kara <jack@suse.cz> |
ocfs2: Avoid livelock in ocfs2_readpage() When someone writes to an inode, readers accessing the same inode via ocfs2_readpage() just busyloop trying to get ip_alloc_sem because do_generic_file_read() looks up the page again and retries ->readpage() when previous attempt failed with AOP_TRUNCATED_PAGE. When there are enough readers, they can occupy all CPUs and in non-preempt kernel the system is deadlocked because writer holding ip_alloc_sem is never run to release the semaphore. Fix the problem by making reader block on ip_alloc_sem to break the busy loop. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <jlbec@evilplan.org>
|
#
a11f7e63 |
|
22-Jun-2011 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: serialize unaligned aio Fix a corruption that can happen when we have (two or more) outstanding aio's to an overlapping unaligned region. Ext4 (e9e3bcecf44c04b9e6b505fd8e2eb9cea58fb94d) and xfs recently had to fix similar issues. In our case what happens is that we can have an outstanding aio on a region and if a write comes in with some bytes overlapping the original aio we may decide to read that region into a page before continuing (typically because of buffered-io fallback). Since we have no ordering guarantees with the aio, we can read stale or bad data into the page and then write it back out. If the i/o is page and block aligned, then we avoid this issue as there won't be any need to read data from disk. I took the same approach as Eric in the ext4 patch and introduced some serialization of unaligned async direct i/o. I don't expect this to have an effect on the most common cases of AIO. Unaligned aio will be slower though, but that's far more acceptable than data corruption. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <jlbec@evilplan.org>
|
#
5cffff9e |
|
24-Jul-2011 |
Wengang Wang <wen.gang.wang@oracle.com> |
ocfs2: Fix ocfs2_page_mkwrite() This patch address two shortcomings in ocfs2_page_mkwrite(): 1. Makes the function return better VM_FAULT_* errors. 2. It handles a error that is triggered when a page is dropped from the mapping due to memory pressure. This patch locks the page to prevent that. [Patch was cleaned up by Sunil Mushran.] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
|
#
72c5052d |
|
24-Jun-2011 |
Christoph Hellwig <hch@infradead.org> |
fs: move inode_dio_done to the end_io handler For filesystems that delay their end_io processing we should keep our i_dio_count until the the processing is done. Enable this by moving the inode_dio_done call to the end_io handler if one exist. Note that the actual move to the workqueue for ext4 and XFS is not done in this patch yet, but left to the filesystem maintainers. At least for XFS it's not needed yet either as XFS has an internal equivalent to i_dio_count. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
df2d6f26 |
|
24-Jun-2011 |
Christoph Hellwig <hch@infradead.org> |
fs: always maintain i_dio_count Maintain i_dio_count for all filesystems, not just those using DIO_LOCKING. This these filesystems to also protect truncate against direct I/O requests by using common code. Right now the only non-DIO_LOCKING filesystem that appears to do so is XFS, which uses an opencoded variant of the i_dio_count scheme. Behaviour doesn't change for filesystems never calling inode_dio_wait. For ext4 behaviour changes when using the dioread_nonlock option, which previously was missing any protection between truncate and direct I/O reads. For ocfs2 that handcrafted i_dio_count manipulations are replaced with the common code now enable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
bd5fe6c5 |
|
24-Jun-2011 |
Christoph Hellwig <hch@infradead.org> |
fs: kill i_alloc_sem i_alloc_sem is a rather special rw_semaphore. It's the last one that may be released by a non-owner, and it's write side is always mirrored by real exclusion. It's intended use it to wait for all pending direct I/O requests to finish before starting a truncate. Replace it with a hand-grown construct: - exclusion for truncates is already guaranteed by i_mutex, so it can simply fall way - the reader side is replaced by an i_dio_count member in struct inode that counts the number of pending direct I/O requests. Truncate can't proceed as long as it's non-zero - when i_dio_count reaches non-zero we wake up a pending truncate using wake_up_bit on a new bit in i_flags - new references to i_dio_count can't appear while we are waiting for it to read zero because the direct I/O count always needs i_mutex (or an equivalent like XFS's i_iolock) for starting a new operation. This scheme is much simpler, and saves the space of a spinlock_t and a struct list_head in struct inode (typically 160 bits on a non-debug 64-bit system). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
272b62c1 |
|
17-Feb-2011 |
Goldwyn Rodrigues <rgoldwyn@gmail.com> |
Treat writes as new when holes span across page boundaries When a hole spans across page boundaries, the next write forces a read of the block. This could end up reading existing garbage data from the disk in ocfs2_map_page_blocks. This leads to non-zero holes. In order to avoid this, mark the writes as new when the holes span across page boundaries. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: jlbec <jlbec@evilplan.org>
|
#
7eaceacc |
|
10-Mar-2011 |
Jens Axboe <jaxboe@fusionio.com> |
block: remove per-queue plugging Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
#
9558156b |
|
22-Feb-2011 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Remove mlog(0) from fs/ocfs2/aops.c Remove all the "mlog(0," in fs/ocfs2/aops.c. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
|
#
c1e8d35e |
|
07-Mar-2011 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Remove EXIT from masklog. mlog_exit is used to record the exit status of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. This patch just try to remove it or change it. So: 1. if all the error paths already use mlog_errno, it is just removed. Otherwise, it will be replaced by mlog_errno. 2. if it is used to print some return value, it is replaced with mlog(0,...). mlog_exit_ptr is changed to mlog(0. All those mlog(0,...) will be replaced with trace events later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
|
#
ef6b689b |
|
20-Feb-2011 |
Tao Ma <boyu.mt@taobao.com> |
ocfs2: Remove ENTRY from masklog. ENTRY is used to record the entry of a function. But because it is added in so many functions, if we enable it, the system logs get filled up quickly and cause too much I/O. So actually no one can open it for a production system or even for a test. So for mlog_entry_void, we just remove it. for mlog_entry(...), we replace it with mlog(0,...), and they will be replace by trace event later. Signed-off-by: Tao Ma <boyu.mt@taobao.com>
|
#
50308d81 |
|
04-Nov-2010 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Try to free truncate log when meeting ENOSPC in write. Recently, one of our colleagues meet with a problem that if we write/delete a 32mb files repeatly, we will get an ENOSPC in the end. And the corresponding bug is 1288. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288 The real problem is that although we have freed the clusters, they are in truncate log and they will be summed up so that we can free them once in a whole. So this patch just try to resolve it. In case we see -ENOSPC in ocfs2_write_begin_no_lock, we will check whether the truncate log has enough clusters for our need, if yes, we will try to flush the truncate log at that point and try again. This method is inspired by Mark Fasheh <mfasheh@suse.com>. Thanks. Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
39c99f12 |
|
06-Dec-2010 |
Tristan Ye <tristan.ye@oracle.com> |
Ocfs2: Teach 'coherency=full' O_DIRECT writes to correctly up_read i_alloc_sem. Due to newly-introduced 'coherency=full' O_DIRECT writes also takes the EX rw_lock like buffered writes did(rw_level == 1), it turns out messing the usage of 'level' in ocfs2_dio_end_io() up, which caused i_alloc_sem being failed to get up_read'd correctly. This patch tries to teach ocfs2_dio_end_io to understand well on all locking stuffs by explicitly introducing a new bit for i_alloc_sem in iocb's private data, just like what we did for rw_lock. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
ebdec241 |
|
06-Oct-2010 |
Christoph Hellwig <hch@lst.de> |
fs: kill block_prepare_write __block_write_begin and block_prepare_write are identical except for slightly different calling conventions. Convert all callers to the __block_write_begin calling conventions and drop block_prepare_write. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
83fd9c7f |
|
10-Jun-2010 |
Goldwyn Rodrigues <rgoldwyn@gmail.com> |
Reorganize data elements to reduce struct sizes Thanks for the comments. I have incorportated them all. CONFIG_OCFS2_FS_STATS is enabled and CONFIG_DEBUG_LOCK_ALLOC is disabled. Statistics now look like - ocfs2_write_ctxt: 2144 - 2136 = 8 ocfs2_inode_info: 1960 - 1848 = 112 ocfs2_journal: 168 - 160 = 8 ocfs2_lock_res: 336 - 304 = 32 ocfs2_refcount_tree: 512 - 472 = 40 Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
15502712 |
|
11-Aug-2010 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Add struct file to ocfs2_refcount_cow. Add a new parameter 'struct file *' to ocfs2_refcount_cow so that we can add readahead support later. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
0378da0f |
|
11-Aug-2010 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: pass struct file* to ocfs2_write_begin_nolock. struct file * has file_ra_state to store the readahead state and data. So pass this to ocfs2_write_begin_nolock so that it can be used in ocfs2_refcount_cow. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
eafdc7d1 |
|
04-Jun-2010 |
Christoph Hellwig <hch@lst.de> |
sort out blockdev_direct_IO variants Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
552ef802 |
|
27-Jul-2010 |
Christoph Hellwig <hch@lst.de> |
direct-io: move aio_complete into ->end_io Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
#
40e2e973 |
|
18-Jul-2010 |
Christoph Hellwig <hch@infradead.org> |
direct-io: move aio_complete into ->end_io Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Alex Elder <aelder@sgi.com>
|
#
693c241a |
|
02-Jul-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: No need to zero pages past i_size. When ocfs2 fills a hole, it does so by allocating clusters. When a cluster is larger than the write, ocfs2 must zero the portions of the cluster outside of the write. If the clustersize is smaller than a pagecache page, this is handled by the normal pagecache mechanisms, but when the clustersize is larger than a page, ocfs2's write code will zero the pages adjacent to the write. This makes sure the entire cluster is zeroed correctly. Currently ocfs2 behaves exactly the same when writing past i_size. However, this means ocfs2 is writing zeroed pages for portions of a new cluster that are beyond i_size. The page writeback code isn't expecting this. It treats all pages past the one containing i_size as left behind due to a previous truncate operation. Thankfully, ocfs2 calculates the number of pages it will be working on up front. The rest of the write code merely honors the original calculation. We can simply trim the number of pages to only cover the actual file data. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
|
#
5693486b |
|
01-Jul-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Zero the tail cluster when extending past i_size. ocfs2's allocation unit is the cluster. This can be larger than a block or even a memory page. This means that a file may have many blocks in its last extent that are beyond the block containing i_size. There also may be more unwritten extents after that. When ocfs2 grows a file, it zeros the entire cluster in order to ensure future i_size growth will see cleared blocks. Unfortunately, block_write_full_page() drops the pages past i_size. This means that ocfs2 is actually leaking garbage data into the tail end of that last cluster. This is a bug. We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect when a write or truncate is past i_size. They will use ocfs2_zero_extend() to ensure the data is properly zeroed. Older versions of ocfs2_zero_extend() simply zeroed every block between i_size and the zeroing position. This presumes three things: 1) There is allocation for all of these blocks. 2) The extents are not unwritten. 3) The extents are not refcounted. (1) and (2) hold true for non-sparse filesystems, which used to be the only users of ocfs2_zero_extend(). (3) is another bug. Since we're now using ocfs2_zero_extend() for sparse filesystems as well, we teach ocfs2_zero_extend() to check every extent between i_size and the zeroing position. If the extent is unwritten, it is ignored. If it is refcounted, it is CoWed. Then it is zeroed. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
|
#
a4bfb4cf |
|
06-Jul-2010 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: When zero extending, do it by page. ocfs2_zero_extend() does its zeroing block by block, but it calls a function named ocfs2_write_zero_page(). Let's have ocfs2_write_zero_page() handle the page level. From ocfs2_zero_extend()'s perspective, it is now page-at-a-time. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
|
#
4fe370af |
|
07-Dec-2009 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: use allocation reservations during file write Add a per-inode reservations structure and pass it through to the reservations code. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
5dd4056d |
|
03-Mar-2010 |
Christoph Hellwig <hch@infradead.org> |
dquot: cleanup space allocation / freeing routines Get rid of the alloc_space, free_space, reserve_space, claim_space and release_rsv dquot operations - they are always called from the filesystem and if a filesystem really needs their own (which none currently does) it can just call into it's own routine directly. Move shared logic into the common __dquot_alloc_space, dquot_claim_space_nodirty and __dquot_free_space low-level methods, and rationalize the wrappers around it to move as much as possible code into the common block for CONFIG_QUOTA vs not. Also rename all these helpers to be named dquot_* instead of vfs_dq_*. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
|
#
cbaee472 |
|
25-Feb-2010 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Only bug out in direct io write for reflinked extent. In ocfs2_direct_IO_get_blocks, we only need to bug out in case of we are going to write a recounted extent rec. What a silly bug introduced by me! Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
|
#
2bd63216 |
|
25-Jan-2010 |
Sunil Mushran <sunil.mushran@oracle.com> |
ocfs2/trivial: Remove trailing whitespaces Patch removes trailing whitespaces. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
1e431f5c |
|
03-Nov-2009 |
Christoph Hellwig <hch@lst.de> |
cleanup blockdev_direct_IO locking Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
5fe878ae |
|
15-Dec-2009 |
Christoph Hellwig <hch@lst.de> |
direct-io: cleanup blockdev_direct_IO locking Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Elder <aelder@sgi.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
b80474b4 |
|
10-Sep-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Use buffer IO if we are appending a file. In ocfs2_file_aio_write, we will prevent direct io if we find that we are appending(changing i_size) and call generic_file_aio_write_nolock. But actually O_DIRECT flag is there and this function will call generic_file_direct_write eventually which will update i_size and leave di->i_size alone. The bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1173. So this patch let ocfs2_direct_IO returns 0 directly if we are appending so that buffered write will be called and di->i_size get updated successfully. And this is also what we want in ocfs2_file_aio_write. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
37f8a2bf |
|
25-Aug-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: CoW a reflinked cluster when it is truncated. When we truncate a file to a specific size which resides in a reflinked cluster, we need to CoW it since ocfs2_zero_range_for_truncate will zero the space after the size(just another type of write). So we add a "max_cpos" in ocfs2_refcount_cow so that it will stop when it hit the max cluster offset. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
293b2f70 |
|
24-Aug-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Integrate CoW in file write. When we use mmap, we CoW the refcountd clusters in ocfs2_write_begin_nolock. While for normal file io(including directio), we do CoW in ocfs2_prepare_inode_for_write. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
6f70fa51 |
|
24-Aug-2009 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Add CoW support. This patch try CoW support for a refcounted record. the whole process will be: 1. Calculate how many clusters we need to CoW and where we start. Extents that are not completely encompassed by the write will be broken on 1MB boundaries. 2. Do CoW for the clusters with the help of page cache. 3. Change the b-tree structure with the new allocated clusters. Signed-off-by: Tao Ma <tao.ma@oracle.com>
|
#
aa261f54 |
|
16-Sep-2009 |
Andi Kleen <andi@firstfloor.org> |
HWPOISON: Enable .remove_error_page for migration aware file systems Enable removing of corrupted pages through truncation for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs These should cover most server needs. I chose the set of migration aware file systems for this for now, assuming they have been especially audited. But in general it should be safe for all file systems on the data area that support read/write and truncate. Caveat: the hardware error handler does not take i_mutex for now before calling the truncate function. Is that ok? Cc: tytso@mit.edu Cc: hch@infradead.org Cc: mfasheh@suse.com Cc: aia21@cantab.net Cc: hugh.dickins@tiscali.co.uk Cc: swhiteho@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com>
|
#
5e404e9e |
|
13-Feb-2009 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree(). With this commit, extent tree operations are divorced from inodes and rely on ocfs2_caching_info. Phew! Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
0cf2f763 |
|
12-Feb-2009 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Pass struct ocfs2_caching_info to the journal functions. The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
8379e7c4 |
|
04-Sep-2009 |
Sunil Mushran <sunil.mushran@oracle.com> |
ocfs2: ocfs2_write_begin_nolock() should handle len=0 Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
e7432675 |
|
06-Aug-2009 |
Sunil Mushran <sunil.mushran@oracle.com> |
ocfs2: Initialize the cluster we're writing to in a non-sparse extend In a non-sparse extend, we correctly allocate (and zero) the clusters between the old_i_size and pos, but we don't zero the portions of the cluster we're writing to outside of pos<->len. It handles clustersize > pagesize and blocksize < pagesize. [Cleaned up by Joel Becker.] Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
1f4cea37 |
|
12-Jul-2009 |
Wengang Wang <wen.gang.wang@oracle.com> |
ocfs2: Fail ocfs2_get_block() immediately when a block needs allocation ocfs2_get_block() does no allocation. Hole filling for writes should have happened farther up in the call chain. We detect this case and print an error, but we then continue with the function. We should be exiting immediately. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
cbfa9639 |
|
12-Jul-2009 |
Wengang Wang <wen.gang.wang@oracle.com> |
ocfs2: Fix error return in ocfs2_write_cluster() A typo caused ocfs2_write_cluster() to return 0 in some error cases. Fix it. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
|
#
1fca3a05 |
|
05-Mar-2009 |
Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> |
ocfs2: Pagecache usage optimization on ocfs2 A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
d9ae49d6 |
|
04-Mar-2009 |
Tiger Yang <tiger.yang@oracle.com> |
ocfs2: tweak to get the maximum inline data size with xattr Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
13723d00 |
|
17-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
a90714c1 |
|
09-Oct-2008 |
Jan Kara <jack@suse.cz> |
ocfs2: Add quota calls for allocation and freeing of inodes and space Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
53ef99ca |
|
18-Nov-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Remove JBD compatibility layer JBD2 is fully backwards compatible with JBD and it's been tested enough with Ocfs2 that we can clean this code up now. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
b657c95c |
|
13-Nov-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Wrap inode block reads in a dedicated function. The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
0fcaa56a |
|
09-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Simplify ocfs2_read_block() More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
31d33073 |
|
09-Oct-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Require an inode for ocfs2_read_block(s)(). Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
a81cb88b |
|
07-Oct-2008 |
Mark Fasheh <mfasheh@suse.com> |
ocfs2: Don't check for NULL before brelse() This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
|
#
2b4e30fb |
|
03-Sep-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Switch over to JBD2. ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
8d6220d6 |
|
22-Aug-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree() The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
f99b9b7c |
|
20-Aug-2008 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: Make ocfs2_extent_tree the first-class representation of a tree. We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
f56654c4 |
|
18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Add extent tree operation for xattr value btrees Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
0eb8d47e |
|
18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Make high level btree extend code generic Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
e7d4cb6b |
|
18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Abstract ocfs2_extent_tree in b-tree operations. In the old extent tree operation, we take the hypothesis that we are using the ocfs2_extent_list in ocfs2_dinode as the tree root. As xattr will also use ocfs2_extent_list to store large value for a xattr entry, we refactor the tree operation so that xattr can use it directly. The refactoring includes 4 steps: 1. Abstract set/get of last_eb_blk and update_clusters since they may be stored in different location for dinode and xattr. 2. Add a new structure named ocfs2_extent_tree to indicate the extent tree the operation will work on. 3. Remove all the use of fe_bh and di, use root_bh and root_el in extent tree instead. So now all the fe_bh is replaced with et->root_bh, el with root_el accordingly. 4. Make ocfs2_lock_allocators generic. Now it is limited to be only used in file extend allocation. But the whole function is useful when we want to store large EAs. Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used for anything other than truncate inode data btrees. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
811f933d |
|
18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations. But they are all limited to an inode btree because they use a struct ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list (the part of an ocfs2_dinode they actually use) so that the xattr btree code can use these functions. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
231b87d1 |
|
18-Aug-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Modify ocfs2_num_free_extents for future xattr usage. ocfs2_num_free_extents() is used to find the number of free extent records in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to use this for extended attribute trees in the future, so genericize the interface the take a buffer head. A future patch will allow that buffer_head to contain any structure rooting an ocfs2 btree. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
0e116227 |
|
02-Sep-2008 |
Tao Ma <tao.ma@oracle.com> |
ocfs2: Fix a bug in direct IO read. ocfs2 will become read-only if we try to read the bytes which pass the end of i_size. This can be easily reproduced by following steps: 1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse. 2. create a small file(say less than 100 bytes) and we will create the file which is allocated 1 cluster. 3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit. 4. The ocfs2 volume becomes read-only and dmesg shows: OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks: Inode 66010 has a hole at block 1 File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted. So suppress the ERROR message. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
961cecbe |
|
16-Jul-2008 |
Sunil Mushran <sunil.mushran@oracle.com> |
[PATCH] ocfs2: Fix oops when racing files truncates with writes into an mmap region This patch fixes an oops that is reproduced when one races writes to a mmap-ed region with another process truncating the file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
c0420ad2 |
|
30-Jun-2008 |
Coly Li <coyli@suse.de> |
[PATCH] ocfs2: fix oops in mmap_truncate testing This patch fixes a mmap_truncate bug which was found by ocfs2 test suite. In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races mmap writes and truncates from multiple processes. While the test is running, a stat from another node forces writeout, causing an oops in ocfs2_get_block() because it sees a buffer to write which isn't allocated. This patch fixed the bug by clear dirty and uptodate bits in buffer, leave the buffer unmapped and return. Fix is suggested by Mark Fasheh, and I code up the patch. Signed-off-by: Coly Li <coyli@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
58dadcdb |
|
28-Mar-2008 |
Julia Lawall <julia@diku.dk> |
fs/ocfs2/aops.c: test for IS_ERR rather than 0 The function ocfs2_start_trans always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
#
86c838b0 |
|
26-Feb-2008 |
Julia Lawall <julia@diku.dk> |
[PATCH] fs/ocfs2/aops.c: Correct use of ! and & In commit e6bafba5b4765a5a252f1b8d31cbf6d2459da337, a bug was fixed that involved converting !x & y to !(x & y). The code below shows the same pattern, and thus should perhaps be fixed in the same way. This is not tested and clearly changes the semantics, so it is only something to consider. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
eebd2aa3 |
|
04-Feb-2008 |
Christoph Lameter <clameter@sgi.com> |
Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user Simplify page cache zeroing of segments of pages through 3 functions zero_user_segments(page, start1, end1, start2, end2) Zeros two segments of the page. It takes the position where to start and end the zeroing which avoids length calculations and makes code clearer. zero_user_segment(page, start, end) Same for a single segment. zero_user(page, start, length) Length variant for the case where we know the length. We remove the zero_user_page macro. Issues: 1. Its a macro. Inline functions are preferable. 2. The KM_USER0 macro is only defined for HIGHMEM. Having to treat this special case everywhere makes the code needlessly complex. The parameter for zeroing is always KM_USER0 except in one single case that we open code. Avoiding KM_USER0 makes a lot of code not having to be dealing with the special casing for HIGHMEM anymore. Dealing with kmap is only necessary for HIGHMEM configurations. In those configurations we use KM_USER0 like we do for a series of other functions defined in highmem.h. Since KM_USER0 is depends on HIGHMEM the existing zero_user_page function could not be a macro. zero_user_* functions introduced here can be be inline because that constant is not used when these functions are called. Also extract the flushing of the caches to be outside of the kmap. [akpm@linux-foundation.org: fix nfs and ntfs build] [akpm@linux-foundation.org: fix ntfs build some more] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Steven French <sfrench@us.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
d2849fb2 |
|
19-Dec-2007 |
Jan Kara <jack@suse.cz> |
ocfs2: Safer read_inline_data() In ocfs2_read_inline_data() we should store file size in loff_t. Although the file size should fit in 32 bits we cannot be sure in case filesystem is corrupted. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
628a24f5 |
|
30-Oct-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Readpages support Add ->readpages support to Ocfs2. This is rather trivial - all it required is a small update to ocfs2_get_block (for mapping full extents via b_size) and an ocfs2_readpages() function which partially mirrors ocfs2_readpage(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
e63aecb6 |
|
18-Oct-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Rename ocfs2_meta_[un]lock Call this the "inode_lock" now, since it covers both data and meta data. This patch makes no functional changes. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
c934a92d |
|
18-Oct-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remove data locks The meta lock now covers both meta data and data, so this just removes the now-redundant data lock. Combining locks saves us a round of lock mastery per inode and one less lock to ping between nodes during read/write. We don't lose much - since meta locks were always held before a data lock (and at the same level) ordered writeout mode (the default) ensured that flushing for the meta data lock also pushed out data anyways. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
0d8a4e0c |
|
20-Nov-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix comparison in ocfs2_size_fits_inline_data() This was causing us to prematurely push out inline data by one byte. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
4e9563fd |
|
01-Nov-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: fix write() performance regression On file systems which don't support sparse files, Ocfs2_map_page_blocks() was reading blocks on appending writes. This caused write performance to suffer dramatically. Fix this by detecting an appending write on a nonsparse fs and skipping the read. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
b6af1bcd |
|
16-Oct-2007 |
Nick Piggin <npiggin@suse.de> |
ocfs2: convert to new aops Plug ocfs2 into the ->write_begin and ->write_end aops. A bunch of custom code is now gone - the iovec iteration stuff during write and the ocfs2 splice write actor. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
1afc32b9 |
|
07-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Write support for inline data This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline inode data. For the most part, the changes to the core write code can be relied on to do the heavy lifting. Any code calling ocfs2_write_begin (including shared writeable mmap) can count on it doing the right thing with respect to growing inline data to an extent tree. Size reducing truncates, including UNRESVP can simply zero that portion of the inode block being removed. Size increasing truncatesm, including RESVP have to be a little bit smarter and grow the inode to an extent tree if necessary. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
6798d35a |
|
07-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Read support for inline data This hooks up ocfs2_readpage() to populate a page with data from an inode block. Direct IO reads from inline data are modified to fall back to buffered I/O. Appropriate checks are also placed in the extent map code to avoid reading an extent list when inline data might be stored. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
1d410a6e |
|
07-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Small refactor of truncate zeroing code We'll want to reuse most of this when pushing inline data back out to an extent. Keeping this part as a seperate patch helps to keep the upcoming changes for write support uncluttered. The core portion of ocfs2_zero_cluster_pages() responsible for making sure a page is mapped and properly dirtied is abstracted out into it's own function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change, though zeroing becomes optional. We also turn part of ocfs2_free_write_ctxt() into a common function for unlocking and freeing a page array. This operation is very common (and uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense to keep the code in one place. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
65ed39d6 |
|
28-Aug-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: move nonsparse hole-filling into ocfs2_write_begin() By doing this, we can remove any higher level logic which has to have knowledge of btree functionality - any callers of ocfs2_write_begin() can now expect it to do anything necessary to prepare the inode for new data. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Reviewed-by: Joel Becker <joel.becker@oracle.com>
|
#
5c26a7b7 |
|
18-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Don't double set write parameters The target page offsets were being incorrectly set a second time in ocfs2_prepare_page_for_write(), which was causing problems on a 16k page size kernel. Additionally, ocfs2_write_failure() was incorrectly using those parameters instead of the parameters for the individual page being cleaned up. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
db56246c |
|
17-Sep-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix pos/len passed to ocfs2_write_cluster This was broken for file systems whose cluster size is greater than page size. Pos needs to be incremented as we loop through the descriptors, and len needs to be capped to the size of a single cluster. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
30b8548f |
|
05-Sep-2007 |
tao.ma@oracle.com <tao.ma@oracle.com> |
[PATCH] ocfs2: Fix a wrong cluster calculation. In ocfs2_alloc_write_write_ctxt, the written clusters length is calculated by the byte length only. This may cause some problems if we start to write at some position in the end of one cluster and last to a second cluster while the "len" is smaller than a cluster size. In that case, we have to write 2 clusters actually. So we have to take the start position into consideration also. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
54cb8821 |
|
19-Jul-2007 |
Nick Piggin <npiggin@suse.de> |
mm: merge populate and nopage into fault (fixes nonlinear) Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes the virtual address -> file offset differently from linear mappings. ->populate is a layering violation because the filesystem/pagecache code should need to know anything about the virtual memory mapping. The hitch here is that the ->nopage handler didn't pass down enough information (ie. pgoff). But it is more logical to pass pgoff rather than have the ->nopage function calculate it itself anyway (because that's a similar layering violation). Having the populate handler install the pte itself is likewise a nasty thing to be doing. This patch introduces a new fault handler that replaces ->nopage and ->populate and (later) ->nopfn. Most of the old mechanism is still in place so there is a lot of duplication and nice cleanups that can be removed if everyone switches over. The rationale for doing this in the first place is that nonlinear mappings are subject to the pagefault vs invalidate/truncate race too, and it seemed stupid to duplicate the synchronisation logic rather than just consolidate the two. After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in pagecache. Seems like a fringe functionality anyway. NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no users have hit mainline yet. [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
54c57dc3 |
|
20-Jun-2007 |
Eric Sandeen <sandeen@redhat.com> |
[PATCH] ocfs2: zero_user_page conversion Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
2ae99a60 |
|
09-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Support creation of unwritten extents This can now be trivially supported with re-use of our existing extend code. ocfs2_allocate_unwritten_extents() takes a start offset and a byte length and iterates over the inode, adding extents (marked as unwritten) until len is reached. Existing extents are skipped over. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
b27b7cbc |
|
18-Jun-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: support writing of unwritten extents Update the write code to detect when the user is asking to write to an unwritten extent. Like writing to a hole, we must zero the region between the write and the cluster boundaries. Most of the existing cluster zeroing logic can be re-used with some additional checks for the unwritten flag on extent records. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
0d172baa |
|
14-May-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: small cleanup of ocfs2_write_begin_nolock() We can easily seperate out the write descriptor setup and manipulation into helper functions. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
59a5e416 |
|
22-Jun-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: plug truncate into cached dealloc routines Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
bce99768 |
|
18-Jun-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: harden buffer check during mapping of page blocks We don't want to submit buffer_new blocks for read i/o. This actually won't happen right now because those requests during an allocating write are all nicely aligned. It's probably a good idea to provide an explicit check though. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
7307de80 |
|
09-May-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: shared writeable mmap Implement cluster consistent shared writeable mappings using the ->page_mkwrite() callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
607d44aa |
|
09-May-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: factor out write aops into nolock variants ocfs2_mkwrite() will want this so that it can add some mmap specific checks before asking for a write. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
3a307ffc |
|
08-May-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: rework ocfs2_buffered_write_cluster() Use some ideas from the new-aops patch series and turn ocfs2_buffered_write_cluster() into a 2 stage operation with the caller copying data in between. The code now understands multiple cluster writes as a result of having to deal with a full page write for greater than 4k pages. This sets us up to easily call into the write path during ->page_mkwrite(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
eeb47d12 |
|
06-Jun-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix invalid assertion during write on 64k pages The write path code intends to bug if a math error (or unhandled case) results in a write outside of the current cluster boundaries. The actual BUG_ON() statements however are incorrect, leading to a crash on kernels with 64k page size. Fix those by checking against the right variables. Also, move the assertions higher up within the functions so that they trip *before* the code starts to mark buffers. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
5c3c6bb7 |
|
10-May-2007 |
Nate Diller <nate.diller@gmail.com> |
[PATCH] ocfs2: use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller <nate.diller@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
e9dfc0b2 |
|
14-May-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: trylock in ocfs2_readpage() Similarly to the page lock / cluster lock inversion in ocfs2_readpage, we can deadlock on ip_alloc_sem. We can down_read_trylock() instead and just return AOP_TRUNCATED_PAGE if the operation fails. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
9315f130 |
|
01-May-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Force use of GFP_NOFS in ocfs2_write() We can otherwise recurse into the file system. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
1ca1a111 |
|
27-Apr-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: fix sparse warnings in fs/ocfs2 None of these are actually harmful, but the noise makes looking for real problems difficult. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
6cb129f5 |
|
26-Apr-2007 |
Adrian Bunk <bunk@stusta.de> |
[PATCH] fs/ocfs2/: make 3 functions static This patch makes the following needlessly global functions static: - aops.c: ocfs2_write_data_page() - dlmglue.c: ocfs2_dump_meta_lvb_info() - file.c: ocfs2_set_inode_size() Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
7cdfc3a1 |
|
16-Apr-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remember rw lock level during direct io Cluster locking might have been redone because a direct write won't complete, so this needs to be reflected in the iocb. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
8110b073 |
|
22-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix up i_blocks calculation to know about holes Older file systems which didn't support holes did a dumb calculation of i_blocks based on i_size. This is no longer accurate, so fix things up to take actual allocation into account. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
4f902c37 |
|
09-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Fix extent lookup to return true size of holes Initially, we had wired things to return a size '1' of holes. Cook up a small amount of code to find the next extent and calculate the number of clusters between the virtual offset and the next allocated extent. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
49cb8d2d |
|
09-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Read from an unwritten extent returns zeros Return an optional extent flags field from our lookup functions and wire up callers to treat unwritten regions as holes for the purpose of returning zeros to the user. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
6af67d82 |
|
06-Mar-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Use own splice write actor We need to fill holes during a splice write. Provide our own splice write actor which can call ocfs2_file_buffered_write() with a splice-specific callback. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
60b11392 |
|
16-Feb-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: zero tail of sparse files on truncate Since we don't zero on extend anymore, truncate needs to be fixed up to zero the part of a file between i_size and and end of it's cluster. Otherwise a subsequent extend could expose bad data. This introduced a new helper, which can be used in ocfs2_write(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
25baf2da |
|
14-Feb-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Teach ocfs2_get_block() about holes ocfs2_get_block() didn't understand sparse files, fix that. Also remove some code that isn't really useful anymore. We can fix up ocfs2_direct_IO_get_blocks() at the same time. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
5069120b |
|
09-Feb-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: remove ocfs2_prepare_write() and ocfs2_commit_write() These are no longer used, and can't handle file systems with sparse file allocation. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
9517bac6 |
|
09-Feb-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: teach ocfs2_file_aio_write() about sparse files Unfortunately, ocfs2 can no longer make use of generic_file_aio_write_nlock() because allocating writes will require zeroing of pages adjacent to the I/O for cluster sizes greater than page size. Implement a custom file write here, which can order page locks for zeroing. This also has the advantage that cluster locks can easily be ordered outside of the page locks. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
363041a5 |
|
17-Jan-2007 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: temporarily remove extent map caching The code in extent_map.c is not prepared to deal with a subtree being rotated between lookups. This can happen when filling holes in sparse files. Instead of a lengthy patch to update the code (which would likely lose the benefit of caching subtree roots), we remove most of the algorithms and implement a simple path based lookup. A less ambitious extent caching scheme will be added in a later patch. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
03f981cf |
|
04-Jan-2007 |
Joel Becker <joel.becker@oracle.com> |
ocfs2: add some missing address space callbacks Under load, OCFS2 would crash in invalidate_inode_pages2_range() because invalidate_complete_page2() was unable to invalidate a page. It would appear that JBD is holding on to the page. ext3 has a specific ->releasepage() handler to cover this case. Steal ext3's ->releasepage(), ->invalidatepage(), and ->migratepage(), as they appear completely appropriate for OCFS2. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
564f8a32 |
|
14-Dec-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Allow direct I/O read past end of file ocfs2_direct_IO_get_blocks() was incorrectly returning -EIO for a direct I/O read whose start block was past the end of the file allocation tree. Fix things so that we return a hole instead. do_direct_IO() will then notice that the range start is past eof and return a short read. While there, remove the unused vbo_max variable. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
d28c9174 |
|
08-Dec-2006 |
Josef Sipek <jsipek@fsl.cs.sunysb.edu> |
[PATCH] struct path: convert ocfs2 Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
1fabe148 |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t This is mostly a search and replace as ocfs2_journal_handle is now no more than a container for a handle_t pointer. ocfs2_commit_trans() becomes very straight forward, and we remove some out of date comments / code. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
65eff9cc |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: remove handle argument to ocfs2_start_trans() All callers either pass in NULL directly, or a local variable that is already set to NULL. The internals of ocfs2_start_trans() get a nice cleanup as a result. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
02dc1af4 |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: pass ocfs2_super * into ocfs2_commit_trans() This sets us up to remove handle->journal. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
4bcec184 |
|
09-Oct-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: remove unused handle argument from ocfs2_meta_lock_full() Now that this is unused and all callers pass NULL, we can safely remove it. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
e0b4096d |
|
11-Jul-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: properly update i_mtime on buffered write We weren't always updating i_mtime on writes, so fix ocfs2_commit_write() to handle this. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Acked-by: Zach Brown <zach.brown@oracle.com>
|
#
184d7d20 |
|
03-Jun-2006 |
Florin Malita <fmalita@gmail.com> |
ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks() Signed-off-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
f5e54d6e |
|
28-Jun-2006 |
Christoph Hellwig <hch@lst.de> |
[PATCH] mark address_space_operations const Same as with already do with the file operations: keep them in .rodata and prevents people from doing runtime patching. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
53013cba |
|
05-May-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: take data locks around extend We need to take a data lock around extends to protect the pages that ocfs2_zero_extend is going to be pulling into the page cache. Otherwise an extend on one node might populate the page cache with data pages that have no lock coverage. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
1d8fa7a2 |
|
26-Mar-2006 |
Badari Pulavarty <pbadari@us.ibm.com> |
[PATCH] remove ->get_blocks() support Now that get_block() can handle mapping multiple disk blocks, no need to have ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO and makes it users use get_block() instead. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
#
b0697053 |
|
03-Mar-2006 |
Mark Fasheh <mark.fasheh@oracle.com> |
ocfs2: don't use MLF* in the file system Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
|
#
ccd979bd |
|
15-Dec-2005 |
Mark Fasheh <mark.fasheh@oracle.com> |
[PATCH] OCFS2: The Second Oracle Cluster Filesystem The OCFS2 file system module. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>
|