History log of /linux-master/fs/xfs/xfs_qm.c
Revision Date Author Comments
# 841a5f87 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: report quota block corruption errors to the health system

Whenever we encounter corrupt quota blocks, we should report that to the
health monitoring system for later reporting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 20049187 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: track quota updates during live quotacheck

Create a shadow dqtrx system in the quotacheck code that hooks the
regular dquot counter update code. This will be the means to keep our
copy of the dquot counters up to date while the scan runs in real time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 3d8f1426 22-Feb-2024 Darrick J. Wong <djwong@kernel.org>

xfs: report the health of quota counts

Report the health of quota counts.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 3fed24ff 19-Feb-2024 Matthew Wilcox (Oracle) <willy@infradead.org>

xfs: Replace xfs_isilocked with xfs_assert_ilocked

To use the new rwsem_assert_held()/rwsem_assert_held_write(), we can't
use the existing ASSERT macro. Add a new xfs_assert_ilocked() and
convert all the callers.

Fix an apparent bug in xfs_isilocked(): If the caller specifies
XFS_IOLOCK_EXCL | XFS_ILOCK_EXCL, xfs_assert_ilocked() will check both
the IOLOCK and the ILOCK are held for write. xfs_isilocked() only
checked that the ILOCK was held for write.

xfs_assert_ilocked() is always on, even if DEBUG or XFS_WARN aren't
defined. It's a cheap check, so I don't think it's worth defining
it away.

Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# 94a69db2 15-Jan-2024 Dave Chinner <dchinner@redhat.com>

xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS

In the past we've had problems with lockdep false positives stemming
from inode locking occurring in memory reclaim contexts (e.g. from
superblock shrinkers). Lockdep doesn't know that inodes access from
above memory reclaim cannot be accessed from below memory reclaim
(and vice versa) but there has never been a good solution to solving
this problem with lockdep annotations.

This situation isn't unique to inode locks - buffers are also locked
above and below memory reclaim, and we have to maintain lock
ordering for them - and against inodes - appropriately. IOWs, the
same code paths and locks are taken both above and below memory
reclaim and so we always need to make sure the lock orders are
consistent. We are spared the lockdep problems this might cause
by the fact that semaphores and bit locks aren't covered by lockdep.

In general, this sort of lockdep false positive detection is cause
by code that runs GFP_KERNEL memory allocation with an actively
referenced inode locked. When it is run from a transaction, memory
allocation is automatically GFP_NOFS, so we don't have reclaim
recursion issues. So in the places where we do memory allocation
with inodes locked outside of a transaction, we have explicitly set
them to use GFP_NOFS allocations to prevent lockdep false positives
from being reported if the allocation dips into direct memory
reclaim.

More recently, __GFP_NOLOCKDEP was added to the memory allocation
flags to tell lockdep not to track that particular allocation for
the purposes of reclaim recursion detection. This is a much better
way of preventing false positives - it allows us to use GFP_KERNEL
context outside of transactions, and allows direct memory reclaim to
proceed normally without throwing out false positive deadlock
warnings.

The obvious places that lock inodes and do memory allocation are the
lookup paths and inode extent list initialisation. These occur in
non-transactional GFP_KERNEL contexts, and so can run direct reclaim
and lock inodes.

This patch makes a first path through all the explicit GFP_NOFS
allocations in XFS and converts the obvious ones to GFP_KERNEL |
__GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS
allocations from the XFS code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# d4c75a1b 15-Jan-2024 Dave Chinner <dchinner@redhat.com>

xfs: convert remaining kmem_free() to kfree()

The remaining callers of kmem_free() are freeing heap memory, so
we can convert them directly to kfree() and get rid of kmem_free()
altogether.

This conversion was done with:

$ for f in `git grep -l kmem_free fs/xfs`; do
> sed -i s/kmem_free/kfree/ $f
> done
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# f078d4ea 15-Jan-2024 Dave Chinner <dchinner@redhat.com>

xfs: convert kmem_alloc() to kmalloc()

kmem_alloc() is just a thin wrapper around kmalloc() these days.
Convert everything to use kmalloc() so we can get rid of the
wrapper.

Note: the transaction region allocation in xlog_add_to_transaction()
can be a high order allocation. Converting it to use
kmalloc(__GFP_NOFAIL) results in warnings in the page allocation
code being triggered because the mm subsystem does not want us to
use __GFP_NOFAIL with high order allocations like we've been doing
with the kmem_alloc() wrapper for a couple of decades. Hence this
specific case gets converted to xlog_kvmalloc() rather than
kmalloc() to avoid this issue.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# 10634530 15-Jan-2024 Dave Chinner <dchinner@redhat.com>

xfs: convert kmem_zalloc() to kzalloc()

There's no reason to keep the kmem_zalloc() around anymore, it's
just a thin wrapper around kmalloc(), so lets get rid of it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# 0a97c01c 30-Nov-2023 Nhat Pham <nphamcs@gmail.com>

list_lru: allow explicit memcg and NUMA node selection

Patch series "workload-specific and memory pressure-driven zswap
writeback", v8.

There are currently several issues with zswap writeback:

1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:

https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u

But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.

2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).

This patch series solves these issues by separating the global zswap LRU
into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e
memcg- and NUMA-aware) zswap writeback under memory pressure. The new
shrinker does not have any parameter that must be tuned by the user, and
can be opted in or out on a per-memcg basis.

As a proof of concept, we ran the following synthetic benchmark: build the
linux kernel in a memory-limited cgroup, and allocate some cold data in
tmpfs to see if the shrinker could write them out and improved the overall
performance. Depending on the amount of cold data generated, we observe
from 14% to 35% reduction in kernel CPU time used in the kernel builds.


This patch (of 6):

The interface of list_lru is based on the assumption that the list node
and the data it represents belong to the same allocated on the correct
node/memcg. While this assumption is valid for existing slab objects LRU
such as dentries and inodes, it is undocumented, and rather inflexible for
certain potential list_lru users (such as the upcoming zswap shrinker and
the THP shrinker). It has caused us a lot of issues during our
development.

This patch changes list_lru interface so that the caller must explicitly
specify numa node and memcg when adding and removing objects. The old
list_lru_add() and list_lru_del() are renamed to list_lru_add_obj() and
list_lru_del_obj(), respectively.

It also extends the list_lru API with a new function, list_lru_putback,
which undoes a previous list_lru_isolate call. Unlike list_lru_add, it
does not increment the LRU node count (as list_lru_isolate does not
decrement the node count). list_lru_putback also allows for explicit
memcg and NUMA node selection.

Link: https://lkml.kernel.org/r/20231130194023.4102148-1-nphamcs@gmail.com
Link: https://lkml.kernel.org/r/20231130194023.4102148-2-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# fe885278 11-Sep-2023 Qi Zheng <zhengqi.arch@bytedance.com>

xfs: dynamically allocate the xfs-qm shrinker

In preparation for implementing lockless slab shrink, use new APIs to
dynamically allocate the xfs-qm shrinker, so that it can be freed
asynchronously via RCU. Then it doesn't need to wait for RCU read-side
critical section when releasing the struct xfs_quotainfo.

Link: https://lkml.kernel.org/r/20230911094444.68966-37-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Chandan Babu R <chandan.babu@oracle.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Chuck Lever <cel@kernel.org>
Cc: Coly Li <colyli@suse.de>
Cc: Dai Ngo <Dai.Ngo@oracle.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Marijn Suijten <marijn.suijten@somainline.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nadav Amit <namit@vmware.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Olga Kornievskaia <kolga@netapp.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sean Paul <sean@poorly.run>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Song Liu <song@kernel.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Yue Hu <huyue2@coolpad.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 537c013b 24-Sep-2023 Darrick J. Wong <djwong@kernel.org>

xfs: fix reloading entire unlinked bucket lists

During review of the patcheset that provided reloading of the incore
iunlink list, Dave made a few suggestions, and I updated the copy in my
dev tree. Unfortunately, I then got distracted by ... who even knows
what ... and forgot to backport those changes from my dev tree to my
release candidate branch. I then sent multiple pull requests with stale
patches, and that's what was merged into -rc3.

So.

This patch re-adds the use of an unlocked iunlink list check to
determine if we want to allocate the resources to recreate the incore
list. Since lost iunlinked inodes are supposed to be rare, this change
helps us avoid paying the transaction and AGF locking costs every time
we open any inode.

This also re-adds the shutdowns on failure, and re-applies the
restructuring of the inner loop in xfs_inode_reload_unlinked_bucket, and
re-adds a requested comment about the quotachecking code.

Retain the original RVB tag from Dave since there's no code change from
the last submission.

Fixes: 68b957f64fca1 ("xfs: load uncached unlinked inodes into memory on demand")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 49813a21 11-Sep-2023 Darrick J. Wong <djwong@kernel.org>

xfs: make inode unlinked bucket recovery work with quotacheck

Teach quotacheck to reload the unlinked inode lists when walking the
inode table. This requires extra state handling, since it's possible
that a reloaded inode will get inactivated before quotacheck tries to
scan it; in this case, we need to ensure that the reloaded inode does
not have dquots attached when it is freed.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 0c7273e4 05-Mar-2023 Dave Chinner <dchinner@redhat.com>

xfs: quotacheck failure can race with background inode inactivation

The background inode inactivation can attached dquots to inodes, but
this can race with a foreground quotacheck failure that leads to
disabling quotas and freeing the mp->m_quotainfo structure. The
background inode inactivation then tries to allocate a quota, tries
to dereference mp->m_quotainfo, and crashes like so:

XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas.
xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff)
BUG: kernel NULL pointer dereference, address: 00000000000002a8
....
CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker
RIP: 0010:xfs_dquot_alloc+0x95/0x1e0
....
Call Trace:
<TASK>
xfs_qm_dqread+0x46/0x440
xfs_qm_dqget_inode+0x154/0x500
xfs_qm_dqattach_one+0x142/0x3c0
xfs_qm_dqattach_locked+0x14a/0x170
xfs_qm_dqattach+0x52/0x80
xfs_inactive+0x186/0x340
xfs_inodegc_worker+0xd3/0x430
process_one_work+0x3b1/0x960
worker_thread+0x52/0x660
kthread+0x161/0x1a0
ret_from_fork+0x29/0x50
</TASK>
....

Prevent this race by flushing all the queued background inode
inactivations pending before purging all the cached dquots when
quotacheck fails.

Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# f2d40141 12-Jan-2023 Christian Brauner <brauner@kernel.org>

fs: port inode_init_owner() to mnt_idmap

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# e195605e 03-Jan-2023 Li zeming <zeming@nfschina.com>

xfs: xfs_qm: remove unnecessary ‘0’ values from error

error is assigned first, so it does not need to initialize the
assignment.

Signed-off-by: Li zeming <zeming@nfschina.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 52f31ed2 08-Dec-2022 Dave Chinner <dchinner@redhat.com>

xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING

Resulting in a UAF if the shrinker races with some other dquot
freeing mechanism that sets XFS_DQFLAG_FREEING before the dquot is
removed from the LRU. This can occur if a dquot purge races with
drop_caches.

Reported-by: syzbot+912776840162c13db1a3@syzkaller.appspotmail.com
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# f0c2d7d2 03-Aug-2022 Darrick J. Wong <djwong@kernel.org>

xfs: fix intermittent hang during quotacheck

Every now and then, I see the following hang during mount time
quotacheck when running fstests. Turning on KASAN seems to make it
happen somewhat more frequently. I've edited the backtrace for brevity.

XFS (sdd): Quotacheck needed: Please wait.
XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
CPU: 0 PID: 1831409 Comm: mount Tainted: G W 5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:assfail+0x46/0x4a [xfs]
Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24
RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0
RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a
R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20
R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000
FS: 00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0
Call Trace:
<TASK>
xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
get_tree_bdev+0x175/0x280
vfs_get_tree+0x1a/0x80
path_mount+0x6f5/0xaa0
__x64_sys_mount+0x103/0x140
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

I /think/ this can happen if xfs_qm_flush_one is racing with
xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has
taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the
dquot buffer, let alone queued it to the delwri list. In this case,
flush_one will fail to get the dquot flush lock, but it can lock the
incore buffer, but xfs_buf_delwri_pushbuf will then trip over this
ASSERT, which checks that the buffer isn't on a delwri list. The hang
results because the _delwri_submit_buffers ignores non DELWRI_Q buffers,
which means that xfs_buf_iowait waits forever for an IO that has not yet
been scheduled.

AFAICT, a reasonable solution here is to detect a dquot buffer that is
not on a DELWRI list, drop it, and return -EAGAIN to try the flush
again. It's not /that/ big of a deal if quotacheck writes the dquot
buffer repeatedly before we even set QUOTA_CHKD.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# e33c267a 31-May-2022 Roman Gushchin <roman.gushchin@linux.dev>

mm: shrinkers: provide shrinkers with names

Currently shrinkers are anonymous objects. For debugging purposes they
can be identified by count/scan function names, but it's not always
useful: e.g. for superblock's shrinkers it's nice to have at least an
idea of to which superblock the shrinker belongs.

This commit adds names to shrinkers. register_shrinker() and
prealloc_shrinker() functions are extended to take a format and arguments
to master a name.

In some cases it's not possible to determine a good name at the time when
a shrinker is allocated. For such cases shrinker_debugfs_rename() is
provided.

The expected format is:
<subsystem>-<shrinker_type>[:<instance>]-<id>
For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

After this change the shrinker debugfs directory looks like:
$ cd /sys/kernel/debug/shrinker/
$ ls
dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42
mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43
mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44
rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49
sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13
sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36
sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19
sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10
sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9
sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37
sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38
sb-dax-11 sb-proc-45 sb-tmpfs-35
sb-debugfs-7 sb-proc-46 sb-tmpfs-40

[roman.gushchin@linux.dev: fix build warnings]
Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 732436ef 09-Jul-2022 Darrick J. Wong <djwong@kernel.org>

xfs: convert XFS_IFORK_PTR to a static inline helper

We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 85c73bf7 07-Jul-2022 Dave Chinner <dchinner@redhat.com>

xfs: rework xfs_buf_incore() API

Make it consistent with the other buffer APIs to return a error and
the buffer is placed in a parameter.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 86d40f1e 26-May-2022 Darrick J. Wong <djwong@kernel.org>

xfs: purge dquots after inode walk fails during quotacheck

xfs/434 and xfs/436 have been reporting occasional memory leaks of
xfs_dquot objects. These tests themselves were the messenger, not the
culprit, since they unload the xfs module, which trips the slub
debugging code while tearing down all the xfs slab caches:

=============================================================================
BUG xfs_dquot (Tainted: G W ): Objects remaining in xfs_dquot on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
CPU: 0 PID: 3953166 Comm: modprobe Tainted: G W 5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014

Since we don't generally rmmod the xfs module between fstests, this
means that xfs/434 is really just the canary in the coal mine --
something leaked a dquot, but we don't know who. After days of pounding
on fstests with kmemleak enabled, I finally got it to spit this out:

unreferenced object 0xffff8880465654c0 (size 536):
comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s)
hex dump (first 32 bytes):
60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff `JVF....X..\....
00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00 ..RI............
backtrace:
[<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs]
[<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs]
[<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs]
[<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs]
[<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs]
[<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs]
[<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs]
[<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs]
[<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs]
[<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs]
[<ffffffff81170bb2>] process_one_work+0x672/0x1040
[<ffffffff81171b1b>] worker_thread+0x59b/0xec0
[<ffffffff8118711e>] kthread+0x29e/0x340
[<ffffffff810032bf>] ret_from_fork+0x1f/0x30

Now we know that quotacheck is at fault, but even this report was
canaryish -- it was triggered by xfs/494, which doesn't actually mount
any filesystems. (kmemleak can be a little slow to notice leaks, even
with fstests repeatedly whacking it to look for them.) Looking at the
*previous* fstest, however, showed that the test run before xfs/494 was
xfs/117. The tipoff to the problem is in this excerpt from dmesg:

XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00 IN..............
00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68 ..........WTT.Lh
00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00 ..}.m...4.}.m...
00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00 4.}.m...........
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab ................
00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03 .....W{.........
00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08 ................
XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas.

The dinode verifier decided that the inode was corrupt, which causes
iget to return with EFSCORRUPTED. Since this happened during
quotacheck, it is obvious that the kernel aborted the inode walk on
account of the corruption error and disabled quotas. Unfortunately, we
neglect to purge the dquot cache before doing that, which is how the
dquots leaked.

The problems started 10 years ago in commit b84a3a, when the dquot lists
were converted to a radix tree, but the error handling behavior was not
correctly preserved -- in that commit, if the bulkstat failed and
usrquota was enabled, the bulkstat failure code would be overwritten by
the result of flushing all the dquots to disk. As long as that
succeeds, we'd continue the quota mount as if everything were ok, but
instead we're now operating with a corrupt inode and incorrect quota
usage counts. I didn't notice this bug in 2019 when I wrote commit
ebd126a, which changed quotacheck to skip the dqflush when the scan
doesn't complete due to inode walk failures.

Introduced-by: b84a3a96751f ("xfs: remove the per-filesystem list of dquots")
Fixes: ebd126a651f8 ("xfs: convert quotacheck to use the new iwalk functions")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 5cc21e52 10-May-2022 Catherine Hoang <catherine.hoang@oracle.com>

xfs: remove quota warning limit from struct xfs_quota_limits

Warning limits in xfs quota is an unused feature that is currently
documented as unimplemented, and it is unclear what the intended
behavior of these limits are. Remove the ‘warn’ field from struct
xfs_quota_limits and any other related code.

Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 01728b44 17-Mar-2022 Dave Chinner <dchinner@redhat.com>

xfs: xfs_is_shutdown vs xlog_is_shutdown cage fight

I've been chasing a recent resurgence in generic/388 recovery
failure and/or corruption events. The events have largely been
uninitialised inode chunks being tripped over in log recovery
such as:

XFS (pmem1): User initiated shutdown received.
pmem1: writeback error on inode 12621949, offset 1019904, sector 12968096
XFS (pmem1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xa3/0xf0 (fs/xfs/xfs_fsops.c:500). Shutting down filesystem.
XFS (pmem1): Please unmount the filesystem and rectify the problem(s)
XFS (pmem1): Unmounting Filesystem
XFS (pmem1): Mounting V5 Filesystem
XFS (pmem1): Starting recovery (logdev: internal)
XFS (pmem1): bad inode magic/vsn daddr 8723584 #0 (magic=1818)
XFS (pmem1): Metadata corruption detected at xfs_inode_buf_verify+0x180/0x190, xfs_inode block 0x851c80 xfs_inode_buf_verify
XFS (pmem1): Unmount and run xfs_repair
XFS (pmem1): First 128 bytes of corrupted metadata buffer:
00000000: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000010: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000020: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000030: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000040: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000050: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000060: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
00000070: 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 ................
XFS (pmem1): metadata I/O error in "xlog_recover_items_pass2+0x52/0xc0" at daddr 0x851c80 len 32 error 117
XFS (pmem1): log mount/recovery failed: error -117
XFS (pmem1): log mount failed

There have been isolated random other issues, too - xfs_repair fails
because it finds some corruption in symlink blocks, rmap
inconsistencies, etc - but they are nowhere near as common as the
uninitialised inode chunk failure.

The problem has clearly happened at runtime before recovery has run;
I can see the ICREATE log item in the log shortly before the
actively recovered range of the log. This means the ICREATE was
definitely created and written to the log, but for some reason the
tail of the log has been moved past the ordered buffer log item that
tracks INODE_ALLOC buffers and, supposedly, prevents the tail of the
log moving past the ICREATE log item before the inode chunk buffer
is written to disk.

Tracing the fsstress processes that are running when the filesystem
shut down immediately pin-pointed the problem:

user shutdown marks xfs_mount as shutdown

godown-213341 [008] 6398.022871: console: [ 6397.915392] XFS (pmem1): User initiated shutdown received.
.....

aild tries to push ordered inode cluster buffer

xfsaild/pmem1-213314 [001] 6398.022974: xfs_buf_trylock: dev 259:1 daddr 0x851c80 bbcount 0x20 hold 16 pincount 0 lock 0 flags DONE|INODES|PAGES caller xfs_inode_item_push+0x8e
xfsaild/pmem1-213314 [001] 6398.022976: xfs_ilock_nowait: dev 259:1 ino 0x851c80 flags ILOCK_SHARED caller xfs_iflush_cluster+0xae

xfs_iflush_cluster() checks xfs_is_shutdown(), returns true,
calls xfs_iflush_abort() to kill writeback of the inode.
Inode is removed from AIL, drops cluster buffer reference.

xfsaild/pmem1-213314 [001] 6398.022977: xfs_ail_delete: dev 259:1 lip 0xffff88880247ed80 old lsn 7/20344 new lsn 7/21000 type XFS_LI_INODE flags IN_AIL
xfsaild/pmem1-213314 [001] 6398.022978: xfs_buf_rele: dev 259:1 daddr 0x851c80 bbcount 0x20 hold 17 pincount 0 lock 0 flags DONE|INODES|PAGES caller xfs_iflush_abort+0xd7

.....

All inodes on cluster buffer are aborted, then the cluster buffer
itself is aborted and removed from the AIL *without writeback*:

xfsaild/pmem1-213314 [001] 6398.023011: xfs_buf_error_relse: dev 259:1 daddr 0x851c80 bbcount 0x20 hold 2 pincount 0 lock 0 flags ASYNC|DONE|STALE|INODES|PAGES caller xfs_buf_ioend_fail+0x33
xfsaild/pmem1-213314 [001] 6398.023012: xfs_ail_delete: dev 259:1 lip 0xffff8888053efde8 old lsn 7/20344 new lsn 7/20344 type XFS_LI_BUF flags IN_AIL

The inode buffer was at 7/20344 when it was removed from the AIL.

xfsaild/pmem1-213314 [001] 6398.023012: xfs_buf_item_relse: dev 259:1 daddr 0x851c80 bbcount 0x20 hold 2 pincount 0 lock 0 flags ASYNC|DONE|STALE|INODES|PAGES caller xfs_buf_item_done+0x31
xfsaild/pmem1-213314 [001] 6398.023012: xfs_buf_rele: dev 259:1 daddr 0x851c80 bbcount 0x20 hold 2 pincount 0 lock 0 flags ASYNC|DONE|STALE|INODES|PAGES caller xfs_buf_item_relse+0x39

.....

Userspace is still running, doing stuff. an fsstress process runs
syncfs() or sync() and we end up in sync_fs_one_sb() which issues
a log force. This pushes on the CIL:

fsstress-213322 [001] 6398.024430: xfs_fs_sync_fs: dev 259:1 m_features 0x20000000019ff6e9 opstate (clean|shutdown|inodegc|blockgc) s_flags 0x70810000 caller sync_fs_one_sb+0x26
fsstress-213322 [001] 6398.024430: xfs_log_force: dev 259:1 lsn 0x0 caller xfs_fs_sync_fs+0x82
fsstress-213322 [001] 6398.024430: xfs_log_force: dev 259:1 lsn 0x5f caller xfs_log_force+0x7c
<...>-194402 [001] 6398.024467: kmem_alloc: size 176 flags 0x14 caller xlog_cil_push_work+0x9f

And the CIL fills up iclogs with pending changes. This picks up
the current tail from the AIL:

<...>-194402 [001] 6398.024497: xlog_iclog_get_space: dev 259:1 state XLOG_STATE_ACTIVE refcnt 1 offset 0 lsn 0x0 flags caller xlog_write+0x149
<...>-194402 [001] 6398.024498: xlog_iclog_switch: dev 259:1 state XLOG_STATE_ACTIVE refcnt 1 offset 0 lsn 0x700005408 flags caller xlog_state_get_iclog_space+0x37e
<...>-194402 [001] 6398.024521: xlog_iclog_release: dev 259:1 state XLOG_STATE_WANT_SYNC refcnt 1 offset 32256 lsn 0x700005408 flags caller xlog_write+0x5f9
<...>-194402 [001] 6398.024522: xfs_log_assign_tail_lsn: dev 259:1 new tail lsn 7/21000, old lsn 7/20344, last sync 7/21448

And it moves the tail of the log to 7/21000 from 7/20344. This
*moves the tail of the log beyond the ICREATE transaction* that was
at 7/20344 and pinned by the inode cluster buffer that was cancelled
above.

....

godown-213341 [008] 6398.027005: xfs_force_shutdown: dev 259:1 tag logerror flags log_io|force_umount file fs/xfs/xfs_fsops.c line_num 500
godown-213341 [008] 6398.027022: console: [ 6397.915406] pmem1: writeback error on inode 12621949, offset 1019904, sector 12968096
godown-213341 [008] 6398.030551: console: [ 6397.919546] XFS (pmem1): Log I/O Error (0x6) detected at xfs_fs_goingdown+0xa3/0xf0 (fs/

And finally the log itself is now shutdown, stopping all further
writes to the log. But this is too late to prevent the corruption
that moving the tail of the log forwards after we start cancelling
writeback causes.

The fundamental problem here is that we are using the wrong shutdown
checks for log items. We've long conflated mount shutdown with log
shutdown state, and I started separating that recently with the
atomic shutdown state changes in commit b36d4651e165 ("xfs: make
forced shutdown processing atomic"). The changes in that commit
series are directly responsible for being able to diagnose this
issue because it clearly separated mount shutdown from log shutdown.

Essentially, once we start cancelling writeback of log items and
removing them from the AIL because the filesystem is shut down, we
*cannot* update the journal because we may have cancelled the items
that pin the tail of the log. That moves the tail of the log
forwards without having written the metadata back, hence we have
corrupt in memory state and writing to the journal propagates that
to the on-disk state.

What commit b36d4651e165 makes clear is that log item state needs to
change relative to log shutdown, not mount shutdown. IOWs, anything
that aborts metadata writeback needs to check log shutdown state
because log items directly affect log consistency. Having them check
mount shutdown state introduces the above race condition where we
cancel metadata writeback before the log shuts down.

To fix this, this patch works through all log items and converts
shutdown checks to use xlog_is_shutdown() rather than
xfs_is_shutdown(), so that we don't start aborting metadata
writeback before we shut off journal writes.

AFAICT, this race condition is a zero day IO error handling bug in
XFS that dates back to the introduction of XLOG_IO_ERROR,
XLOG_STATE_IOERROR and XFS_FORCED_SHUTDOWN back in January 1997.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 11a83f4c 11-Oct-2021 Christoph Hellwig <hch@lst.de>

xfs: remove the xfs_dqblk_t typedef

Remove the few leftover instances of the xfs_dinode_t typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 75c8c50f 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown

Remove the shouty macro and instead use the inline function that
matches other state/feature check wrapper naming. This conversion
was done with sed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 38c26bfd 18-Aug-2021 Dave Chinner <dchinner@redhat.com>

xfs: replace xfs_sb_version checks with feature flag checks

Convert the xfs_sb_version_hasfoo() to checks against
mp->m_features. Checks of the superblock itself during disk
operations (e.g. in the read/write verifiers and the to/from disk
formatters) are not converted - they operate purely on the
superblock state. Everything else should use the mount features.

Large parts of this conversion were done with sed with commands like
this:

for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
done

With manual cleanups for things like "xfs_has_extflgbit" and other
little inconsistencies in naming.

The result is ia lot less typing to check features and an XFS binary
size reduced by a bit over 3kB:

$ size -t fs/xfs/built-in.a
text data bss dec hex filenam
before 1130866 311352 484 1442702 16038e (TOTALS)
after 1127727 311352 484 1439563 15f74b (TOTALS)

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 108523b8 06-Aug-2021 Darrick J. Wong <djwong@kernel.org>

xfs: queue inactivation immediately when quota is nearing enforcement

Now that we have made the inactivation of unlinked inodes a background
task to increase the throughput of file deletions, we need to be a
little more careful about how long of a delay we can tolerate.

Specifically, if the dquots attached to the inode being inactivated are
nearing any kind of enforcement boundary, we want to queue that
inactivation work immediately so that users don't get EDQUOT/ENOSPC
errors even after they deleted a bunch of files to stay within quota.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 149e53af 06-Aug-2021 Christoph Hellwig <hch@lst.de>

xfs: remove the active vs running quota differentiation

These only made a difference when quotaoff supported disabling quota
accounting on a mounted file system, so we can switch everyone to use
a single set of flags and helpers now. Note that the *QUOTA_ON naming
for the helpers is kept as it was the much more commonly used one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# e497dfba 06-Aug-2021 Christoph Hellwig <hch@lst.de>

xfs: remove the flags argument to xfs_qm_dquot_walk

We always purge all dquots now, so drop the argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 40b52225 06-Aug-2021 Christoph Hellwig <hch@lst.de>

xfs: remove support for disabling quota accounting on a mounted file system

Disabling quota accounting is hairy, racy code with all kinds of pitfalls.
And it has a very strange mind set, as quota accounting (unlike
enforcement) really is a propery of the on-disk format. There is no good
use case for supporting this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# b652afd9 01-Jun-2021 Dave Chinner <dchinner@redhat.com>

xfs: get rid of xfs_dir_ialloc()

This is just a simple wrapper around the per-ag inode allocation
that doesn't need to exist. The internal mechanism to select and
allocate within an AG does not need to be exposed outside
xfs_ialloc.c, and it being exposed simply makes it harder to follow
the code and simplify it.

This is simplified by internalising xf_dialloc_select_ag() and
xfs_dialloc_ag() into a single xfs_dialloc() function and then
xfs_dir_ialloc() can go away.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 9bbafc71 01-Jun-2021 Dave Chinner <dchinner@redhat.com>

xfs: move xfs_perag_get/put to xfs_ag.[ch]

They are AG functions, not superblock functions, so move them to the
appropriate location.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>


# 862a804a 13-Apr-2021 Christoph Hellwig <hch@lst.de>

xfs: move the XFS_IFEXTENTS check into xfs_iread_extents

Move the XFS_IFEXTENTS check from the callers into xfs_iread_extents to
simplify the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# 6e73a545 29-Mar-2021 Christoph Hellwig <hch@lst.de>

xfs: move the di_nblocks field to struct xfs_inode

In preparation of removing the historic icinode struct, move the nblocks
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# ceaf603c 29-Mar-2021 Christoph Hellwig <hch@lst.de>

xfs: move the di_projid field to struct xfs_inode

In preparation of removing the historic icinode struct, move the projid
field into the containing xfs_inode structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>


# e6a688c3 22-Mar-2021 Dave Chinner <dchinner@redhat.com>

xfs: initialise attr fork on inode create

When we allocate a new inode, we often need to add an attribute to
the inode as part of the create. This can happen as a result of
needing to add default ACLs or security labels before the inode is
made visible to userspace.

This is highly inefficient right now. We do the create transaction
to allocate the inode, then we do an "add attr fork" transaction to
modify the just created empty inode to set the inode fork offset to
allow attributes to be stored, then we go and do the attribute
creation.

This means 3 transactions instead of 1 to allocate an inode, and
this greatly increases the load on the CIL commit code, resulting in
excessive contention on the CIL spin locks and performance
degradation:

18.99% [kernel] [k] __pv_queued_spin_lock_slowpath
3.57% [kernel] [k] do_raw_spin_lock
2.51% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock
2.48% [kernel] [k] memcpy
2.34% [kernel] [k] xfs_log_commit_cil

The typical profile resulting from running fsmark on a selinux enabled
filesytem is adds this overhead to the create path:

- 15.30% xfs_init_security
- 15.23% security_inode_init_security
- 13.05% xfs_initxattrs
- 12.94% xfs_attr_set
- 6.75% xfs_bmap_add_attrfork
- 5.51% xfs_trans_commit
- 5.48% __xfs_trans_commit
- 5.35% xfs_log_commit_cil
- 3.86% _raw_spin_lock
- do_raw_spin_lock
__pv_queued_spin_lock_slowpath
- 0.70% xfs_trans_alloc
0.52% xfs_trans_reserve
- 5.41% xfs_attr_set_args
- 5.39% xfs_attr_set_shortform.constprop.0
- 4.46% xfs_trans_commit
- 4.46% __xfs_trans_commit
- 4.33% xfs_log_commit_cil
- 2.74% _raw_spin_lock
- do_raw_spin_lock
__pv_queued_spin_lock_slowpath
0.60% xfs_inode_item_format
0.90% xfs_attr_try_sf_addname
- 1.99% selinux_inode_init_security
- 1.02% security_sid_to_context_force
- 1.00% security_sid_to_context_core
- 0.92% sidtab_entry_to_string
- 0.90% sidtab_sid2str_get
0.59% sidtab_sid2str_put.part.0
- 0.82% selinux_determine_inode_label
- 0.77% security_transition_sid
0.70% security_compute_sid.part.0

And fsmark creation rate performance drops by ~25%. The key point to
note here is that half the additional overhead comes from adding the
attribute fork to the newly created inode. That's crazy, considering
we can do this same thing at inode create time with a couple of
lines of code and no extra overhead.

So, if we know we are going to add an attribute immediately after
creating the inode, let's just initialise the attribute fork inside
the create transaction and chop that whole chunk of code out of
the create fast path. This completely removes the performance
drop caused by enabling SELinux, and the profile looks like:

- 8.99% xfs_init_security
- 9.00% security_inode_init_security
- 6.43% xfs_initxattrs
- 6.37% xfs_attr_set
- 5.45% xfs_attr_set_args
- 5.42% xfs_attr_set_shortform.constprop.0
- 4.51% xfs_trans_commit
- 4.54% __xfs_trans_commit
- 4.59% xfs_log_commit_cil
- 2.67% _raw_spin_lock
- 3.28% do_raw_spin_lock
3.08% __pv_queued_spin_lock_slowpath
0.66% xfs_inode_item_format
- 0.90% xfs_attr_try_sf_addname
- 0.60% xfs_trans_alloc
- 2.35% selinux_inode_init_security
- 1.25% security_sid_to_context_force
- 1.21% security_sid_to_context_core
- 1.19% sidtab_entry_to_string
- 1.20% sidtab_sid2str_get
- 0.86% sidtab_sid2str_put.part.0
- 0.62% _raw_spin_lock_irqsave
- 0.77% do_raw_spin_lock
__pv_queued_spin_lock_slowpath
- 0.84% selinux_determine_inode_label
- 0.83% security_transition_sid
0.86% security_compute_sid.part.0

Which indicates the XFS overhead of creating the selinux xattr has
been halved. This doesn't fix the CIL lock contention problem, just
means it's not a limiting factor for this workload. Lock contention
in the security subsystems is going to be an issue soon, though...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: fix compilation error when CONFIG_SECURITY=n]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>


# f736d93d 21-Jan-2021 Christoph Hellwig <hch@lst.de>

xfs: support idmapped mounts

Enable idmapped mounts for xfs. This basically just means passing down
the user_namespace argument from the VFS methods down to where it is
passed to the relevant helpers.

Note that full-filesystem bulkstat is not supported from inside idmapped
mounts as it is an administrative operation that acts on the whole file
system. The limitation is not applied to the bulkstat single operation
that just operates on a single inode.

Link: https://lore.kernel.org/r/20210121131959.646623-40-christian.brauner@ubuntu.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>


# 5c615f0f 01-Feb-2021 Darrick J. Wong <djwong@kernel.org>

xfs: remove xfs_qm_vop_chown_reserve

Now that the only caller of this function is xfs_trans_alloc_ichange,
just open-code the meat of _chown_reserve in that caller. Drop the
(redundant) [ugp]id checks because xfs has a 1:1 relationship between
quota ids and incore dquots.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 1aecf373 29-Jan-2021 Darrick J. Wong <djwong@kernel.org>

xfs: fix chown leaking delalloc quota blocks when fssetxattr fails

While refactoring the quota code to create a function to allocate inode
change transactions, I noticed that xfs_qm_vop_chown_reserve does more
than just make reservations: it also *modifies* the incore counts
directly to handle the owner id change for the delalloc blocks.

I then observed that the fssetxattr code continues validating input
arguments after making the quota reservation but before dirtying the
transaction. If the routine decides to error out, it fails to undo the
accounting switch! This leads to incorrect quota reservation and
failure down the line.

We can fix this by making the reservation function do only that -- for
the new dquot, it reserves ondisk and delalloc blocks to the
transaction, and the old dquot hangs on to its incore reservation for
now. Once we actually switch the dquots, we can then update the incore
reservations because we've dirtied the transaction and it's too late to
turn back now.

No fixes tag because this has been broken since the start of git.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 1abcf261 09-Dec-2020 Dave Chinner <dchinner@redhat.com>

xfs: move on-disk inode allocation out of xfs_ialloc()

So xfs_ialloc() will only address in-core inode allocation then,
Also, rename xfs_ialloc() to xfs_dir_ialloc_init() in order to
keep everything in xfs_inode.c under the same namespace.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 97611f93 06-Oct-2020 Kaixu Xia <kaixuxia@tencent.com>

xfs: do the ASSERT for the arguments O_{u,g,p}dqpp

If we pass in XFS_QMOPT_{U,G,P}QUOTA flags and different uid/gid/prid
than them currently associated with the inode, the arguments
O_{u,g,p}dqpp shouldn't be NULL, so add the ASSERT for them.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 74af4c17 22-Sep-2020 Kaixu Xia <kaixuxia@tencent.com>

xfs: remove the unused parameter id from xfs_qm_dqattach_one

Since we never use the second parameter id, so remove it from
xfs_qm_dqattach_one() function.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 5aff6750 16-Sep-2020 Kaixu Xia <kaixuxia@tencent.com>

xfs: remove the unnecessary xfs_dqid_t type cast

Since the type prid_t and xfs_dqid_t both are uint32_t, seems the
type cast is unnecessary, so remove it.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 06dbf82b 24-Aug-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: trace timestamp limits

Add a couple of tracepoints so that we can check the timestamp limits
being set on inodes and quotas.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 4ea1ff3b 17-Aug-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: widen ondisk quota expiration timestamps to handle y2038+

Enable the bigtime feature for quota timers. We decrease the accuracy
of the timers to ~4s in exchange for being able to set timers up to the
bigtime maximum.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 11d8a919 17-Aug-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor quota expiration timer modification

Define explicit limits on the range of quota grace period expiration
timeouts and refactor the code that modifies the timeouts into helpers
that clamp the values appropriately. Note that we'll refactor the
default grace period timer separately.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# d8c1af0d 15-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rename the ondisk dquot d_flags to d_type

The ondisk dquot stores the quota record type in the flags field.
Rename this field to d_type to make the _type relationship between the
ondisk and incore dquot more obvious.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 1a7ed271 15-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: create xfs_dqtype_t to represent quota types

Create a new type (xfs_dqtype_t) to represent the type of an incore
dquot (user, group, project, or none). Rename the incore dquot's
dq_flags field to q_type.

This allows us to replace all the "uint type" arguments to the quota
functions with "xfs_dqtype_t type", to make it obvious when we're
passing a quota type argument into a function.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 8cd4901d 15-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rename XFS_DQ_{USER,GROUP,PROJ} to XFS_DQTYPE_*

We're going to split up the incore dquot state flags from the ondisk
dquot flags (eventually renaming this "type") so start by renaming the
three flags and the bitmask that are going to participate in this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# f9751c4a 15-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: drop the type parameter from xfs_dquot_verify

xfs_qm_reset_dqcounts (aka quotacheck) is the only xfs_dqblk_verify
caller that actually knows the specific quota type that it's looking
for. Since everything else just pass in type==0 (including the buffer
verifier), drop the parameter and open-code the check like
xfs_dquot_from_disk already does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# c8c753e1 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove unnecessary arguments from quota adjust functions

struct xfs_dquot already has a pointer to the xfs mount, so remove the
redundant parameter from xfs_qm_adjust_dq*.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 438769e3 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor default quota limits by resource

Now that we've split up the dquot resource fields into separate structs,
do the same for the default limits to enable further refactoring.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 51dbb1be 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove qcore from incore dquots

Now that we've stopped using qcore entirely, drop it from the incore
dquot.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 19dce7ea 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: stop using q_core timers in the quota code

Add timers fields to the incore dquot, and use that instead of the ones
in qcore. This eliminates a bunch of endian conversions and will
eventually allow us to remove qcore entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>


# c8c45fb2 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: stop using q_core warning counters in the quota code

Add warning counter fields to the incore dquot, and use that instead of
the ones in qcore. This eliminates a bunch of endian conversions and
will eventually allow us to remove qcore entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>


# be37d40c 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: stop using q_core counters in the quota code

Add counter fields to the incore dquot, and use that instead of the ones
in qcore. This eliminates a bunch of endian conversions and will
eventually allow us to remove qcore entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>


# d3537cf9 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: stop using q_core limits in the quota code

Add limits fields in the incore dquot, and use that instead of the ones
in qcore. This eliminates a bunch of endian conversions and will
eventually allow us to remove qcore entirely.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>


# 784e80f5 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: use a per-resource struct for incore dquot data

Introduce a new struct xfs_dquot_res that we'll use to track all the
incore data for a particular resource type (block, inode, rt block).
This will help us (once we've eliminated q_core) to declutter quota
functions that currently open-code field access or pass around fields
around explicitly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>


# c51df733 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: stop using q_core.d_id in the quota code

Add a dquot id field to the incore dquot, and use that instead of the
one in qcore. This eliminates a bunch of endian conversions and will
eventually allow us to remove qcore entirely.

We also rearrange the start of xfs_dquot to remove padding holes, saving
8 bytes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>


# 985a78fd 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rename dquot incore state flags

Rename the existing incore dquot "dq_flags" field to "q_flags" to match
everything else in the structure, then move the two actual dquot state
flags to the XFS_DQFLAG_ namespace from XFS_DQ_.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>


# 0dcc0728 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor quotacheck flags usage

We only use the XFS_QMOPT flags in quotacheck to signal the quota type,
so rip out all the flags handling and just pass the type all the way
through.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>


# c97738a9 14-Jul-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: clear XFS_DQ_FREEING if we can't lock the dquot buffer to flush

In commit 8d3d7e2b35ea, we changed xfs_qm_dqpurge to bail out if we
can't lock the dquot buf to flush the dquot. This prevents the AIL from
blocking on the dquot, but it also forgets to clear the FREEING flag on
its way out. A subsequent purge attempt will see the FREEING flag is
set and bail out, which leads to dqpurge_all failing to purge all the
dquots.

(copy-pasting from Dave Chinner's identical patch)

This was found by inspection after having xfs/305 hang 1 in ~50
iterations in a quotaoff operation:

[ 8872.301115] xfs_quota D13888 92262 91813 0x00004002
[ 8872.302538] Call Trace:
[ 8872.303193] __schedule+0x2d2/0x780
[ 8872.304108] ? do_raw_spin_unlock+0x57/0xd0
[ 8872.305198] schedule+0x6e/0xe0
[ 8872.306021] schedule_timeout+0x14d/0x300
[ 8872.307060] ? __next_timer_interrupt+0xe0/0xe0
[ 8872.308231] ? xfs_qm_dqusage_adjust+0x200/0x200
[ 8872.309422] schedule_timeout_uninterruptible+0x2a/0x30
[ 8872.310759] xfs_qm_dquot_walk.isra.0+0x15a/0x1b0
[ 8872.311971] xfs_qm_dqpurge_all+0x7f/0x90
[ 8872.313022] xfs_qm_scall_quotaoff+0x18d/0x2b0
[ 8872.314163] xfs_quota_disable+0x3a/0x60
[ 8872.315179] kernel_quotactl+0x7e2/0x8d0
[ 8872.316196] ? __do_sys_newstat+0x51/0x80
[ 8872.317238] __x64_sys_quotactl+0x1e/0x30
[ 8872.318266] do_syscall_64+0x46/0x90
[ 8872.319193] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8872.320490] RIP: 0033:0x7f46b5490f2a
[ 8872.321414] Code: Bad RIP value.

Returning -EAGAIN from xfs_qm_dqpurge() without clearing the
XFS_DQ_FREEING flag means the xfs_qm_dqpurge_all() code can never
free the dquot, and we loop forever waiting for the XFS_DQ_FREEING
flag to go away on the dquot that leaked it via -EAGAIN.

Fixes: 8d3d7e2b35ea ("xfs: trylock underlying buffer on dquot flush")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# e850301f 21-May-2020 Eric Sandeen <sandeen@redhat.com>

xfs: per-type quota timers and warn limits

Move timers and warnings out of xfs_quotainfo and into xfs_def_quota
so that we can utilize them on a per-type basis, rather than enforcing
them based on the values found in the first enabled quota type.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
[zlang: new way to get defquota in xfs_qm_init_timelimits]
[zlang: remove redundant defq assign]
Signed-off-by: Zorro Lang <zlang@redhat.com>

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# ce6e7e79 21-May-2020 Eric Sandeen <sandeen@redhat.com>

xfs: switch xfs_get_defquota to take explicit type

xfs_get_defquota() currently takes an xfs_dquot, and from that obtains
the type of default quota we should get (user/group/project).

But early in init, we don't have access to a fully set up quota, so
that's not possible. The next patch needs go set up default quota
timers early, so switch xfs_get_defquota to take an explicit type
and add a helper function to obtain that type from an xfs_dquot
for the existing callers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 3dbb9aa3 21-May-2020 Eric Sandeen <sandeen@redhat.com>

xfs: pass xfs_dquot to xfs_qm_adjust_dqtimers

Pass xfs_dquot rather than xfs_disk_dquot to xfs_qm_adjust_dqtimers;
this makes it symmetric with xfs_qm_adjust_dqlimits and will help
the next patch.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# dcf1ccc9 21-May-2020 Eric Sandeen <sandeen@redhat.com>

xfs: always return -ENOSPC on project quota reservation failure

XFS project quota treats project hierarchies as "mini filesysems" and
so rather than -EDQUOT, the intent is to return -ENOSPC when a quota
reservation fails, but this behavior is not consistent.

The only place we make a decision between -EDQUOT and -ENOSPC
returns based on quota type is in xfs_trans_dqresv().

This behavior is currently controlled by whether or not the
XFS_QMOPT_ENOSPC flag gets passed into the quota reservation. However,
its use is not consistent; paths such as xfs_create() and xfs_symlink()
don't set the flag, so a reservation failure will return -EDQUOT for
project quota reservation failures rather than -ENOSPC for these sorts
of operations, even for project quota:

# mkdir mnt/project
# xfs_quota -x -c "project -s -p mnt/project 42" mnt
# xfs_quota -x -c 'limit -p isoft=2 ihard=3 42' mnt
# touch mnt/project/file{1,2,3}
touch: cannot touch ‘mnt/project/file3’: Disk quota exceeded

We can make this consistent by not requiring the flag to be set at the
top of the callchain; instead we can simply test whether we are
reserving a project quota with XFS_QM_ISPDQ in xfs_trans_dqresv and if
so, return -ENOSPC for that failure. This removes the need for the
XFS_QMOPT_ENOSPC altogether and simplifies the code a fair bit.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 7994aae8 22-Apr-2020 Kaixu Xia <kaixuxia@tencent.com>

xfs: remove unnecessary assertion from xfs_qm_vop_create_dqattach

The check XFS_IS_QUOTA_RUNNING() has been done when enter the
xfs_qm_vop_create_dqattach() function, it will return directly
if the result is false, so the followed XFS_IS_QUOTA_RUNNING()
assertion is unnecessary. If we truly care about this, the check
also can be added to the condition of next if statements.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# fb353ff1 22-Apr-2020 Kaixu Xia <kaixuxia@tencent.com>

xfs: reserve quota inode transaction space only when needed

We share an inode between gquota and pquota with the older
superblock that doesn't have separate pquotino, and for the
need_alloc == false case we don't need to call xfs_dir_ialloc()
function, so add the check if reserved free disk blocks is
needed.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# c140735b 22-Apr-2020 Kaixu Xia <kaixuxia@tencent.com>

xfs: trace quota allocations for all quota types

The trace event xfs_dquot_dqalloc does not depend on the
value uq, so remove the condition, and trace quota allocations
for all quota types.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 8d3d7e2b 27-Mar-2020 Brian Foster <bfoster@redhat.com>

xfs: trylock underlying buffer on dquot flush

A dquot flush currently blocks on the buffer lock for the underlying
dquot buffer. In turn, this causes xfsaild to block rather than
continue processing other items in the meantime. Update
xfs_qm_dqflush() to trylock the buffer, similar to how inode buffers
are handled, and return -EAGAIN if the lock fails. Fix up any
callers that don't currently handle the error properly.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 5885539f 24-Mar-2020 Darrick J. Wong <darrick.wong@oracle.com>

xfs: preserve default grace interval during quotacheck

When quotacheck runs, it zeroes all the timer fields in every dquot.
Unfortunately, it also does this to the root dquot, which erases any
preconfigured grace intervals and warning limits that the administrator
may have set. Worse yet, the incore copies of those variables remain
set. This cache coherence problem manifests itself as the grace
interval mysteriously being reset back to the defaults at the /next/
mount.

Fix it by not resetting the root disk dquot's timer and warning fields.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# ba8adad5 21-Feb-2020 Christoph Hellwig <hch@lst.de>

xfs: remove the kuid/kgid conversion wrappers

Remove the XFS wrappers for converting from and to the kuid/kgid types.
Mostly this means switching to VFS i_{u,g}id_{read,write} helpers, but
in a few spots the calls to the conversion functions is open coded.
To match the use of sb->s_user_ns in the helpers and other file systems,
sb->s_user_ns is also used in the quota code. The ACL code already does
the conversion in a grotty layering violation in the VFS xattr code,
so it keeps using init_user_ns for the identity mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 54295159 21-Feb-2020 Christoph Hellwig <hch@lst.de>

xfs: remove the icdinode di_uid/di_gid members

Use the Linux inode i_uid/i_gid members everywhere and just convert
from/to the scalar value when reading or writing the on-disk inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# c072fbef 12-Nov-2019 Pavel Reichl <preichl@redhat.com>

xfs: remove the xfs_quotainfo_t typedef

Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# aefe69a4 12-Nov-2019 Pavel Reichl <preichl@redhat.com>

xfs: remove the xfs_disk_dquot_t and xfs_dquot_t

Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix some of the comments]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# de7a866f 12-Nov-2019 Christoph Hellwig <hch@lst.de>

xfs: merge the projid fields in struct xfs_icdinode

There is no point in splitting the fields like this in an purely
in-memory structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# a71895c5 11-Nov-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: convert open coded corruption check to use XFS_IS_CORRUPT

Convert the last of the open coded corruption check and report idioms to
use the XFS_IS_CORRUPT macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# a5155b87 02-Nov-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: always log corruption errors

Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 707e0dda 26-Aug-2019 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

fs: xfs: Remove KM_NOSLEEP and KM_SLEEP.

Since no caller is using KM_NOSLEEP and no callee branches on KM_SLEEP,
we can remove KM_NOSLEEP and replace KM_SLEEP with 0.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 13d59a2a 03-Jul-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: specify AG in bulk req

Add a new xfs_bulk_ireq flag to constrain the iteration to a single AG.
If the passed-in startino value is zero then we start with the first
inode in the AG that the user passes in; otherwise, we iterate only
within the same AG as the passed-in inode.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 3e5a428b 03-Jul-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: poll waiting for quotacheck

Create a pwork destroy function that uses polling instead of
uninterruptible sleep to wait for work items to finish so that we can
touch the softlockup watchdog. IOWs, gross hack.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 40786717 03-Jul-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: multithreaded iwalk implementation

Create a parallel iwalk implementation and switch quotacheck to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# ebd126a6 02-Jul-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: convert quotacheck to use the new iwalk functions

Convert quotacheck to use the new iwalk iterator to dig through the
inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 250d4b4c 28-Jun-2019 Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused header files

There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this. I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 394aafdc 17-Apr-2019 Darrick J. Wong <darrick.wong@oracle.com>

xfs: widen inode delalloc block counter to 64-bits

Widen the incore inode's i_delayed_blks counter to be a 64-bit integer.
This is necessary to fix an integer overflow problem that can be
reproduced easily now that we use the counter to track blocks that are
assigned to the inode in memory but not on disk. This includes actual
delalloc reservations as well as real extents in the COW fork that
are waiting to be remapped into the data fork.

These 'delayed mapping' blocks can easily exceed 2^32 blocks if one
creates a very large sparse file of size approximately 2^33 bytes with
one byte written every 2^23 bytes, sets a very large COW extent size
hint of 2^23 blocks, reflinks the first file into a second file, and
then writes a single byte every 2^23 blocks in the original file.

When this happens, we'll try to create approximately 1024 2^23 extent
reservations in the COW fork, which will overflow the counter and cause
problems.

Note that on x64 we end up filling a 4-byte gap in the structure so this
doesn't increase the incore size.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 44a8736b 25-Jul-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: clean up IRELE/iput callsites

Replace the IRELE macro with a proper function so that we can do proper
typechecking and so that we can stop open-coding iput in scrub, which
means that we'll be able to ftrace inode lifetimes going through scrub
correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 0b61f8a4 05-Jun-2018 Dave Chinner <dchinner@redhat.com>

xfs: convert to SPDX license tags

Remove the verbose license text from XFS files and replace them
with SPDX tags. This does not change the license of any of the code,
merely refers to the common, up-to-date license files in LICENSES/

This change was mostly scripted. fs/xfs/Makefile and
fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
and modified by the following command:

for f in `git grep -l "GNU General" fs/xfs/` ; do
echo $f
cat $f | awk -f hdr.awk > $f.new
mv -f $f.new $f
done

And the hdr.awk script that did the modification (including
detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
is as follows:

$ cat hdr.awk
BEGIN {
hdr = 1.0
tag = "GPL-2.0"
str = ""
}

/^ \* This program is free software/ {
hdr = 2.0;
next
}

/any later version./ {
tag = "GPL-2.0+"
next
}

/^ \*\// {
if (hdr > 0.0) {
print "// SPDX-License-Identifier: " tag
print str
print $0
str=""
hdr = 0.0
next
}
print $0
next
}

/^ \* / {
if (hdr > 1.0)
next
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
next
}

/^ \*/ {
if (hdr > 0.0)
next
print $0
next
}

// {
if (hdr > 0.0) {
if (str != "")
str = str "\n"
str = str $0
next
}
print $0
}

END { }
$

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# eac69e16 14-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: refactor quota limits initialization

Replace all the if (!error) weirdness with helper functions that follow
our regular coding practices, and factor out the ternary expression soup.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 28b9060b 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: rename on-disk dquot counter zap functions

The function 'xfs_qm_dqiterate' doesn't iterate dquots at all, it
iterates all dquot blocks of a quota inode and clears the counters.
Therefore, change the name to something more descriptive so that we can
introduce a real dquot iterator later.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 30ab2dcf 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: replace XFS_QMOPT_DQALLOC with a simple boolean

DQALLOC is only ever used with xfs_qm_dqget*, and the only flag that the
_dqget family of functions cares about is DQALLOC. Therefore, change
it to a boolean 'can alloc?' flag for the dqget interfaces where that
makes sense.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 114e73cc 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove direct calls to _qm_dqread

The quota initialization code needs an "uncached" variant of _dqget to
read in default quota limits and timers before the dquot cache is fully
set up. We've already split up _dqget into its component pieces so
create a fourth variant to address this need, and make dqread internal
to xfs_dquot.c again.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 0fcef127 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: fetch dquots directly during quotacheck

Quotacheck only runs during mount, which means that there are no other
processes in the system that could be doing chown or chproj. Therefore
there's no potential for racing to attach dquots to the inode so we can
drop all the ILOCK and race detection bits from quotacheck.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 4882c19d 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: split out dqget for inodes from regular dqget

There are two uses of dqget here -- one is to return the dquot for a
given type and id, and the other is to return the dquot for a given type
and inode. Those are two separate things, so split them into two
smaller functions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# c14cfcca 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: remove unnecessary xfs_qm_dqattach parameter

The flags argument is always zero, get rid of it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 609001bc 04-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: don't spray logs when dquot flush/purge fail

When dquot flush or purge fail there's no need to spam the logs, we've
already logged the IO error or fs shutdown that caused the flush
failures.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>


# 22525c17 09-May-2018 Dave Chinner <dchinner@redhat.com>

xfs: log item flags are racy

The log item flags contain a field that is protected by the AIL
lock - the XFS_LI_IN_AIL flag. We use non-atomic RMW operations to
set and clear these flags, but most of the updates and checks are
not done with the AIL lock held and so are susceptible to update
races.

Fix this by changing the log item flags to use atomic bitops rather
than be reliant on the AIL lock for update serialisation.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 7224fa48 07-May-2018 Eric Sandeen <sandeen@sandeen.net>

xfs: add full xfs_dqblk verifier

Add an xfs_dqblk verifier so that it can check the uuid on V5 filesystems;
it calls the existing xfs_dquot_verify verifier to validate the
xfs_disk_dquot_t contained inside it. This lets us move the uuid
verification out of the crc verifier, which makes little sense.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 48fa1db8 07-May-2018 Eric Sandeen <sandeen@sandeen.net>

xfs: pass full xfs_dqblk to repair during quotacheck

It's a bit dicey to pass in the smaller xfs_disk_dquot and then cast it to
something larger; pass in the full xfs_dqblk so we know the caller has sent
us the right thing. Rename the function to xfs_dqblk_repair for
clarity.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# e381a0f6 04-May-2018 Eric Sandeen <sandeen@sandeen.net>

xfs: remove unused flags arg from xfs_dquot_verify

Long ago the flags argument was used to determine whether to issue warnings
about corruptions, but that's done elsewhere now and the flag is unused
here, so remove it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 8925a3dc 18-Apr-2018 Dave Chinner <dchinner@redhat.com>

xfs: make xfs_buf_incore out of line

Move xfs_buf_incore out of line and make it the only way to look up
a buffer in the buffer cache from outside the buffer cache. Convert
the external users of _xfs_buf_find() to xfs_buf_incore() and make
_xfs_buf_find() static.

Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: actually rename xfs_incore -> xfs_buf_incore]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# c959025e 02-Apr-2018 Chandan Rajendra <chandan@linux.vnet.ibm.com>

xfs: Remove "committed" argument of xfs_dir_ialloc

xfs_dir_ialloc() rolls the current transaction when allocation of a new
inode required the space manager to perform an allocation and replinish
the Inode btree.

None of the callers of xfs_dir_ialloc() need to know if the
transaction was committed. Hence this commit removes the "committed"
argument of xfs_dir_ialloc.

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# c9690043 09-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: use %px for data pointers when debugging

Starting with commit 57e734423ad ("vsprintf: refactor %pK code out of
pointer"), the behavior of the raw '%p' printk format specifier was
changed to print a 32-bit hash of the pointer value to avoid leaking
kernel pointers into dmesg. For most situations that's good.

This is /undesirable/ behavior when we're trying to debug XFS, however,
so define a PTR_FMT that prints the actual pointer when we're in debug
mode.

Note that %p for tracepoints still prints the raw pointer, so in the
long run we could consider rewriting some of these messages as
tracepoints.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# eebf3cab 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: standardize quota verification function outputs

Rename xfs_dqcheck to xfs_dquot_verify and make it return an
xfs_failaddr_t like every other structure verifier function.
This enables us to check on-disk quotas in the same way that we check
everything else. Callers are now responsible for logging errors, as
XFS_QMOPT_DOWARN goes away.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# eeea7980 08-Jan-2018 Darrick J. Wong <darrick.wong@oracle.com>

xfs: separate dquot repair into a separate function

Move the dquot repair code into a separate function and remove
XFS_QMOPT_DQREPAIR in favor of calling the helper directly. Remove
other dead code because quotacheck is the only caller of DQREPAIR.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>


# 3a3882ff 21-Dec-2017 Aliaksei Karaliou <akaraliou.dev@gmail.com>

xfs: quota: check result of register_shrinker()

xfs_qm_init_quotainfo() does not check result of register_shrinker()
which was tagged as __must_check recently, reported by sparse.

Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com>
[darrick: move xfs_qm_destroy_quotainos nearer xfs_qm_init_quotainos]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 21968815 21-Dec-2017 Aliaksei Karaliou <akaraliou.dev@gmail.com>

xfs: quota: fix missed destroy of qi_tree_lock

xfs_qm_destroy_quotainfo() does not destroy quotainfo->qi_tree_lock
while destroys quotainfo->qi_quotaofflock.

Signed-off-by: Aliaksei Karaliou <akaraliou.dev@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# f59cf5c2 04-Dec-2017 Christoph Hellwig <hch@lst.de>

xfs: remove "no-allocation" reservations for file creations

If we create a new file we will need an inode, and usually some metadata
in the parent direction. Aiming for everything to go well despite the
lack of a reservation leads to dirty transactions cancelled under a heavy
create/delete load. This patch removes those nospace transactions, which
will lead to slightly earlier ENOSPC on some workloads, but instead
prevent file system shutdowns due to cancelling dirty transactions for
others.

A customer could observe assertations failures and shutdowns due to
cancelation of dirty transactions during heavy NFS workloads as shown
below:

2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262

2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246] [<ffffffff81795daf>] dump_stack+0x63/0x81
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262] [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264] [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285] [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308] [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329] [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348] [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366] [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380] [<ffffffff81231de5>] vfs_create+0xd5/0x140
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390] [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396] [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401] [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416] [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423] [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427] [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432] [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438] [<ffffffff810c0d58>] kthread+0xd8/0xf0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451] [<ffffffff8179d962>] ret_from_fork+0x42/0x70
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453] [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---

2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c. Caller xfs_create+0x4ee/0x7d0 [xfs]

2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 8bfadd8d 29-Aug-2017 Christoph Hellwig <hch@lst.de>

xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leaves

Use the existing functionality instead of directly poking into the extent
list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# cfaf2d03 24-Jul-2017 Brian Foster <bfoster@redhat.com>

xfs: fix quotacheck dquot id overflow infinite loop

If a dquot has an id of U32_MAX, the next lookup index increment
overflows the uint32_t back to 0. This starts the lookup sequence
over from the beginning, repeats indefinitely and results in a
livelock.

Update xfs_qm_dquot_walk() to explicitly check for the lookup
overflow and exit the loop.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 7912e7fe 14-Jun-2017 Brian Foster <bfoster@redhat.com>

xfs: push buffer of flush locked dquot to avoid quotacheck deadlock

Reclaim during quotacheck can lead to deadlocks on the dquot flush
lock:

- Quotacheck populates a local delwri queue with the physical dquot
buffers.
- Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
dirties all of the dquots.
- Reclaim kicks in and attempts to flush a dquot whose buffer is
already queud on the quotacheck queue. The flush succeeds but
queueing to the reclaim delwri queue fails as the backing buffer is
already queued. The flush unlock is now deferred to I/O completion
of the buffer from the quotacheck queue.
- The dqadjust bulkstat continues and dirties the recently flushed
dquot once again.
- Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
the flush lock to update the backing buffers with the in-core
recalculated values. It deadlocks on the redirtied dquot as the
flush lock was already acquired by reclaim, but the buffer resides
on the local delwri queue which isn't submitted until the end of
quotacheck.

This is reproduced by running quotacheck on a filesystem with a
couple million inodes in low memory (512MB-1GB) situations. This is
a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
buffer lists"), which removed a trylock and buffer I/O submission
from the quotacheck dquot flush sequence.

Quotacheck first resets and collects the physical dquot buffers in a
delwri queue. Then, it traverses the filesystem inodes via bulkstat,
updates the in-core dquots, flushes the corrected dquots to the
backing buffers and finally submits the delwri queue for I/O. Since
the backing buffers are queued across the entire quotacheck
operation, dquot reclaim cannot possibly complete a dquot flush
before quotacheck completes.

Therefore, quotacheck must submit the buffer for I/O in order to
cycle the flush lock and flush the dirty in-core dquot to the
buffer. Add a delwri queue buffer push mechanism to submit an
individual buffer for I/O without losing the delwri queue status and
use it from quotacheck to avoid the deadlock. This restores
quotacheck behavior to as before the regression was introduced.

Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 20e8a063 21-Apr-2017 Brian Foster <bfoster@redhat.com>

xfs: fix up quotacheck buffer list error handling

The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 90115407 19-Apr-2017 Eric Sandeen <sandeen@sandeen.net>

xfs: remove use of do_div with 32-bit dividend in quota

The kbuild test robot caught this; in debug code we have another
caller of do_div with a 32-bit dividend (j) which is caught now
that we are using the kernel-supplied do_div.

None of the values used here are 64-bit; just use simple division.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# e0d76fa4 26-Jan-2017 Brian Foster <bfoster@redhat.com>

xfs: prevent quotacheck from overloading inode lru

Quotacheck runs at mount time in situations where quota accounting must
be recalculated. In doing so, it uses bulkstat to visit every inode in
the filesystem. Historically, every inode processed during quotacheck
was released and immediately tagged for reclaim because quotacheck runs
before the superblock is marked active by the VFS. In other words,
the final iput() lead to an immediate ->destroy_inode() call, which
allowed the XFS background reclaim worker to start reclaiming inodes.

Commit 17c12bcd3 ("xfs: when replaying bmap operations, don't let
unlinked inodes get reaped") marks the XFS superblock active sooner as
part of the mount process to support caching inodes processed during log
recovery. This occurs before quotacheck and thus means all inodes
processed by quotacheck are inserted to the LRU on release. The
s_umount lock is held until the mount has completed and thus prevents
the shrinkers from operating on the sb. This means that quotacheck can
excessively populate the inode LRU and lead to OOM conditions on systems
without sufficient RAM.

Update the quotacheck bulkstat handler to set XFS_IGET_DONTCACHE on
inodes processed by quotacheck. This causes ->drop_inode() to return 1
and in turn causes iput_final() to evict the inode. This preserves the
original quotacheck behavior and prevents it from overloading the LRU
and running out of memory.

CC: stable@vger.kernel.org # v4.9
Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# 5d829300 07-Nov-2016 Eric Sandeen <sandeen@sandeen.net>

xfs: provide helper for counting extents from if_bytes

The open-coded pattern:

ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t)

is all over the xfs code; provide a new helper
xfs_iext_count(ifp) to count the number of inline extents
in an inode fork.

[dchinner: pick up several missed conversions]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 253f4911 05-Apr-2016 Christoph Hellwig <hch@lst.de>

xfs: better xfs_trans_alloc interface

Merge xfs_trans_reserve and xfs_trans_alloc into a single function call
that returns a transaction with all the required log and block reservations,
and which allows passing transaction flags directly to avoid the cumbersome
_xfs_trans_alloc interface.

While we're at it we also get rid of the transaction type argument that has
been superflous since we stopped supporting the non-CIL logging mode. The
guts of it will be removed in another patch.

[dchinner: fixed transaction leak in error path in xfs_setattr_nonsize]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# be607946 07-Feb-2016 Carlos Maiolino <cmaiolino@redhat.com>

xfs: Split default quota limits by quota type

Default quotas are globally set due historical reasons. IRIX only
supported user and project quotas, and default quota was only
applied to user quotas.

In Linux, when a default quota is set, all different quota types
inherits the same default value.

An user with a quota limit larger than the default quota value, will
still be limited to the default value because the group quotas also
inherits the default quotas. Unless the group which the user belongs
to have a custom quota limit set.

This patch aims to split the default quota value by quota type.
Allowing each quota type having different default values.

Default time limits are still set globally. XFS does not set a
per-user/group timer, but a single global timer. For changing this
behavior, some changes should be made in user-space tools another
bugs being fixed.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# d0164adc 06-Nov-2015 Mel Gorman <mgorman@techsingularity.net>

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ff6d6af2 12-Oct-2015 Bill O'Donnell <billodo@redhat.com>

xfs: per-filesystem stats counter implementation

This patch modifies the stats counting macros and the callers
to those macros to properly increment, decrement, and add-to
the xfs stats counts. The counts for global and per-fs stats
are correctly advanced, and cleared by writing a "1" to the
corresponding clear file.

global counts: /sys/fs/xfs/stats/stats
per-fs counts: /sys/fs/xfs/sda*/stats/stats

global clear: /sys/fs/xfs/stats/stats_clear
per-fs clear: /sys/fs/xfs/sda*/stats/stats_clear

[dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
stats structures and macros. ]

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 70393313 03-Jun-2015 Christoph Hellwig <hch@lst.de>

xfs: saner xfs_trans_commit interface

The flags argument to xfs_trans_commit is not useful for most callers, as
a commit of a transaction without a permanent log reservation must pass
0 here, and all callers for a transaction with a permanent log reservation
except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES. So remove
the flags argument from the public xfs_trans_commit interfaces, and
introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
that regrants a log reservation instead of releasing it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4906e215 03-Jun-2015 Christoph Hellwig <hch@lst.de>

xfs: remove the flags argument to xfs_trans_cancel

xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
XFS_TRANS_ABORT. Both of them are a direct product of the transaction
state, and can be deducted:

- any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
- any transaction with a permanent log reservation needs
XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
log reservation is invalid.

So just remove the flags argument and do the right thing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 58c90473 23-Feb-2015 Dave Chinner <dchinner@redhat.com>

xfs: inodes are new until the dentry cache is set up

Al Viro noticed a generic set of issues to do with filehandle lookup
racing with dentry cache setup. They involve a filehandle lookup
occurring while an inode is being created and the filehandle lookup
racing with the dentry creation for the real file. This can lead to
multiple dentries for the one path being instantiated. There are a
host of other issues around this same set of paths.

The underlying cause is that file handle lookup only waits on inode
cache instantiation rather than full dentry cache instantiation. XFS
is mostly immune to the problems discovered due to it's own internal
inode cache, but there are a couple of corner cases where races can
happen.

We currently clear the XFS_INEW flag when the inode is fully set up
after insertion into the cache. Newly allocated inodes are inserted
locked and so aren't usable until the allocation transaction
commits. This, however, occurs before the dentry and security
information is fully initialised and hence the inode is unlocked and
available for lookups to find too early.

To solve the problem, only clear the XFS_INEW flag for newly created
inodes once the dentry is fully instantiated. This means lookups
will retry until the XFS_INEW flag is removed from the inode and
hence avoids the race conditions in questions.

THis also means that xfs_create(), xfs_create_tmpfile() and
xfs_symlink() need to finish the setup of the inode in their error
paths if we had allocated the inode but failed later in the creation
process. xfs_symlink(), in particular, needed a lot of help to make
it's error handling match that of xfs_create().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# dfcc70a8 23-Feb-2015 Jan Kara <jack@suse.cz>

xfs: Fix quota type in quota structures when reusing quota file

For filesystems without separate project quota inode field in the
superblock we just reuse project quota file for group quotas (and vice
versa) if project quota file is allocated and we need group quota file.
When we reuse the file, quota structures on disk suddenly have wrong
type stored in d_flags though. Nobody really cares about this (although
structure type reported to userspace was wrong as well) except
that after commit 14bf61ffe6ac (quota: Switch ->get_dqblk() and
->set_dqblk() to use bytes as space units) assertion in
xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
was testing without XFS_DEBUG so I didn't notice when submitting the
above commit).

Fix the problem by properly resetting ddq->d_flags when running quotacheck
for a quota file.

CC: stable@vger.kernel.org
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 3f97b163 12-Feb-2015 Vladimir Davydov <vdavydov.dev@gmail.com>

list_lru: add helpers to isolate items

Currently, the isolate callback passed to the list_lru_walk family of
functions is supposed to just delete an item from the list upon returning
LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
__list_lru_walk_one after the callback returns. Since the callback is
allowed to drop the lock after removing an item (it has to return
LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
of elements on the list even if we check them under the lock. This makes
it difficult to move items from one list_lru_one to another, which is
required for per-memcg list_lru reparenting - we can't just splice the
lists, we have to move entries one by one.

This patch therefore introduces helpers that must be used by callback
functions to isolate items instead of raw list_del/list_move. These are
list_lru_isolate and list_lru_isolate_move. They not only remove the
entry from the list, but also fix the nr_items counter, making sure
nr_items always reflects the actual number of elements on the list if
checked under the appropriate lock.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 503c358c 12-Feb-2015 Vladimir Davydov <vdavydov.dev@gmail.com>

list_lru: introduce list_lru_shrink_{count,walk}

Kmem accounting of memcg is unusable now, because it lacks slab shrinker
support. That means when we hit the limit we will get ENOMEM w/o any
chance to recover. What we should do then is to call shrink_slab, which
would reclaim old inode/dentry caches from this cgroup. This is what
this patch set is intended to do.

Basically, it does two things. First, it introduces the notion of
per-memcg slab shrinker. A shrinker that wants to reclaim objects per
cgroup should mark itself as SHRINKER_MEMCG_AWARE. Then it will be
passed the memory cgroup to scan from in shrink_control->memcg. For
such shrinkers shrink_slab iterates over the whole cgroup subtree under
the target cgroup and calls the shrinker for each kmem-active memory
cgroup.

Secondly, this patch set makes the list_lru structure per-memcg. It's
done transparently to list_lru users - everything they have to do is to
tell list_lru_init that they want memcg-aware list_lru. Then the
list_lru will automatically distribute objects among per-memcg lists
basing on which cgroup the object is accounted to. This way to make FS
shrinkers (icache, dcache) memcg-aware we only need to make them use
memcg-aware list_lru, and this is what this patch set does.

As before, this patch set only enables per-memcg kmem reclaim when the
pressure goes from memory.limit, not from memory.kmem.limit. Handling
memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
it is still unclear whether we will have this knob in the unified
hierarchy.

This patch (of 9):

NUMA aware slab shrinkers use the list_lru structure to distribute
objects coming from different NUMA nodes to different lists. Whenever
such a shrinker needs to count or scan objects from a particular node,
it issues commands like this:

count = list_lru_count_node(lru, sc->nid);
freed = list_lru_walk_node(lru, sc->nid, isolate_func,
isolate_arg, &sc->nr_to_scan);

where sc is an instance of the shrink_control structure passed to it
from vmscan.

To simplify this, let's add special list_lru functions to be used by
shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
consolidate the nid and nr_to_scan arguments in the shrink_control
structure.

This will also allow us to avoid patching shrinkers that use list_lru
when we make shrink_slab() per-memcg - all we will have to do is extend
the shrink_control structure to include the target memcg and make
list_lru_shrink_{count,walk} handle this appropriately.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 61e63ecb 21-Jan-2015 Dave Chinner <dchinner@redhat.com>

xfs: consolidate superblock logging functions

We now have several superblock loggin functions that are identical
except for the transaction reservation and whether it shoul dbe a
synchronous transaction or not. Consolidate these all into a single
function, a single reserveration and a sync flag and call it
xfs_sync_sb().

Also, xfs_mod_sb() is not really a modification function - it's the
operation of logging the superblock buffer. hence change the name of
it to reflect this.

Note that we have to change the mp->m_update_flags that are passed
around at mount time to a boolean simply to indicate a superblock
update is needed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4d11a402 21-Jan-2015 Dave Chinner <dchinner@redhat.com>

xfs: remove bitfield based superblock updates

When we log changes to the superblock, we first have to write them
to the on-disk buffer, and then log that. Right now we have a
complex bitfield based arrangement to only write the modified field
to the buffer before we log it.

This used to be necessary as a performance optimisation because we
logged the superblock buffer in every extent or inode allocation or
freeing, and so performance was extremely important. We haven't done
this for years, however, ever since the lazy superblock counters
pulled the superblock logging out of the transaction commit
fast path.

Hence we have a bunch of complexity that is not necessary that makes
writing the in-core superblock to disk much more complex than it
needs to be. We only need to log the superblock now during
management operations (e.g. during mount, unmount or quota control
operations) so it is not a performance critical path anymore.

As such, remove the complex field based logging mechanism and
replace it with a simple conversion function similar to what we use
for all other on-disk structures.

This means we always log the entirity of the superblock, but again
because we rarely modify the superblock this is not an issue for log
bandwidth or CPU time. Indeed, if we do log the superblock
frequently, delayed logging will minimise the impact of this
overhead.

[Fixed gquota/pquota inode sharing regression noticed by bfoster.]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# d2a5e3c6 30-Nov-2014 Markus Elfring <elfring@users.sourceforge.net>

xfs: remove unnecessary null checks

The functions xfs_blkdev_put() and xfs_qm_dqrele() test whether
their argument is NULL and then return immediately. Thus the test
around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 4fb6e8ad 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: merge xfs_ag.h into xfs_format.h

More on-disk format consolidation. A few declarations that weren't on-disk
format related move into better suitable spots.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 6d3ebaae 27-Nov-2014 Christoph Hellwig <hch@lst.de>

xfs: merge xfs_dinode.h into xfs_format.h

More consolidatation for the on-disk format defintions. Note that the
XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related
to the on disk format, but depends on a CONFIG_ option.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# bf1ed383 28-Sep-2014 Dave Chinner <dchinner@redhat.com>

xfs: xfs_qm_dquot_isolate needs locking annotations for sparse

To remove noise from the build.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 5fd364fe 03-Aug-2014 Dave Chinner <dchinner@redhat.com>

xfs: quotacheck leaves dquot buffers without verifiers

When running xfs/305, I noticed that quotacheck was flushing dquot
buffers that did not have the xfs_dquot_buf_ops verifiers attached:

XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8
ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00 DQ....e.........
ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001
ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000
ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80
Call Trace:
[<ffffffff81cf1cca>] dump_stack+0x45/0x56
[<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0
[<ffffffff810db520>] ? wake_up_state+0x20/0x20
[<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0
[<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0
[<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0
[<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220
[<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90
[<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90
[<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0
[<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0
[<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0
[<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340
[<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0
[<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170
[<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20
[<ffffffff811ca8c9>] mount_fs+0x39/0x1b0
[<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120
[<ffffffff811e757e>] do_mount+0x23e/0xad0
[<ffffffff8117abde>] ? __get_free_pages+0xe/0x50
[<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150
[<ffffffff811e8103>] SyS_mount+0x83/0xc0
[<ffffffff81cfd40b>] tracesys+0xdd/0xe2

This was caused by dquot buffer readahead not attaching a verifier
structure to the buffer when readahead was issued, resulting in the
followup read of the buffer finding a valid buffer and so not
attaching new verifiers to the buffer as part of the read.

Also, when a verifier failure occurs, we then read the buffer
without verifiers. Attach the verifiers manually after this read so
that if the buffer is then written it will be verified that the
corruption has been repaired.

Further, when flushing a dquot we don't ask for a verifier when
reading in the dquot buffer the dquot belongs to. Most of the time
this isn't an issue because the buffer is still cached, but when it
is not cached it will result in writing the dquot buffer without
having the verfier attached.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# eb866bbf 24-Jul-2014 Jie Liu <jeff.liu@oracle.com>

xfs: mark xfs_qm_quotacheck as static

xfs_qm_quotacheck() is not used outside of xfs_qm.c. Mark it static
and move it around in the file to avoid a forward declaration.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 2451337d 24-Jun-2014 Dave Chinner <dchinner@redhat.com>

xfs: global error sign conversion

Convert all the errors the core XFs code to negative error signs
like the rest of the kernel and remove all the sign conversion we
do in the interface layers.

Errors for conversion (and comparison) found via searches like:

$ git grep " E" fs/xfs
$ git grep "return E" fs/xfs
$ git grep " E[A-Z].*;$" fs/xfs

Negation points found via searches like:

$ git grep "= -[a-z,A-Z]" fs/xfs
$ git grep "return -[a-z,A-D,F-Z]" fs/xfs
$ git grep " -[a-z].*;" fs/xfs

[ with some bits I missed from Brian Foster ]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# b474c7ae 21-Jun-2014 Eric Sandeen <sandeen@sandeen.net>

xfs: Nuke XFS_ERROR macro

XFS_ERROR was designed long ago to trap return values, but it's not
runtime configurable, it's not consistently used, and we can do
similar error trapping with ftrace scripts and triggers from
userspace.

Just nuke XFS_ERROR and associated bits.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# d99831ff 21-Jun-2014 Eric Sandeen <sandeen@sandeen.net>

xfs: return is not a function

return is not a function. "return(EIO);" is silly;
"return (EIO);" moreso. return is not a function.
Nuke the pointless parens.

[dchinner: catch a couple of extra cases in xfs_attr_list.c,
xfs_acl.c and xfs_linux.h.]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# ee4eec47 14-May-2014 Dave Chinner <dchinner@redhat.com>

xfs: list_lru_init returns a negative error

And we don't invert it properly when initialising the dquot lru
list.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 3c3533757 05-May-2014 Dave Chinner <dchinner@redhat.com>

xfs: remove dquot hints

group and project quota hints are currently stored on the user
dquot. If we are attaching quotas to the inode, then the group and
project dquots are stored as hints on the user dquot to save having
to look them up again later.

The thing is, the hints are not used for that inode for the rest of
the life of the inode - the dquots are attached directly to the
inode itself - so the only time the hints are used is when an inode
first has dquots attached.

When the hints on the user dquot don't match the dquots being
attache dto the inode, they are then removed and replaced with the
new hints. If a user is concurrently modifying files in different
group and/or project contexts, then this leads to thrashing of the
hints attached to user dquot.

If user quotas are not enabled, then hints are never even used.

So, if the hints are used to avoid the cost of the lookup, is the
cost of the lookup significant enough to justify the hint
infrstructure? Maybe it was once, when there was a global quota
manager shared between all XFS filesystems and was hash table based.

However, lookups are now much simpler, requiring only a single lock and
radix tree lookup local to the filesystem and no hash or LRU
manipulations to be made. Hence the cost of lookup is much lower
than when hints were implemented. Turns out that benchmarks show
that, too, with thir being no differnce in performance when doing
file creation workloads as a single user with user, group and
project quotas enabled - the hints do not make the code go any
faster. In fact, removing the hints shows a 2-3% reduction in the
time it takes to create 50 million inodes....

So, let's just get rid of the hints and the complexity around them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# 6ea94bb5 14-Apr-2014 Eric Sandeen <sandeen@redhat.com>

xfs: remove unused mp arg from xfs_calc_dquots_per_chunk()

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>


# da51d32d 06-Dec-2013 Christoph Hellwig <hch@infradead.org>

xfs: use xfs_ilock_data_map_shared in xfs_qm_dqiterate

We might not have read in the extent list at this point, so make sure we
take the ilock exclusively if we have to do so.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 718cc6f8 26-Nov-2013 Jie Liu <jeff.liu@oracle.com>

xfs: fix infinite loop by detaching the group/project hints from user dquot

xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
# mount -ouquota,gquota /dev/sda7 /xfs
# mkdir /xfs/test
# xfs_quota -xc 'off -g' /xfs <-- hangs up
# echo w > /proc/sysrq-trigger
# dmesg

SysRq : Show Blocked State
task PC stack pid father
xfs_quota D 0000000000000000 0 27574 2551 0x00000000
[snip]
Call Trace:
[<ffffffff81aaa21d>] schedule+0xad/0xc0
[<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
[<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
[<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
[<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
[<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
[<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
[<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
[<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
[<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
[<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
[<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
[<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
[<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
[<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
[<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
[<ffffffff812fba4b>] ? iput+0x5b/0x90
[<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
[<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s). Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
[ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit df8052e7dae00bde6f21b40b6e3e1099770f3afc)


# 30d161c9 26-Nov-2013 Jie Liu <jeff.liu@oracle.com>

xfs: fix false assertion at xfs_qm_vop_create_dqattach

After the previous fix, there still has another ASSERT failure if turning
off any type of quota while fsstress is running at the same time.

Backtrace in this case:

[ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
[ 50.867924] ------------[ cut here ]------------
... <snip>
[ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
[ 50.867999] invalid opcode: 0000 [#1] SMP
[ 50.869407] Call Trace:
[ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
[ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
[ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
[ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
[ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
[ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
[ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
[ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
[ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
[ 50.870050] RSP <ffff88002941fd60>
[ 50.879251] ---[ end trace c93a2b342341c65b ]---

We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO. While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee
that the desired quota is still active.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 37eb9706ebf5b99d14c6086cdeef2c2f73f9c9fb)


# df8052e7 26-Nov-2013 Jie Liu <jeff.liu@oracle.com>

xfs: fix infinite loop by detaching the group/project hints from user dquot

xfs_quota(8) will hang up if trying to turn group/project quota off
before the user quota is off, this could be 100% reproduced by:
# mount -ouquota,gquota /dev/sda7 /xfs
# mkdir /xfs/test
# xfs_quota -xc 'off -g' /xfs <-- hangs up
# echo w > /proc/sysrq-trigger
# dmesg

SysRq : Show Blocked State
task PC stack pid father
xfs_quota D 0000000000000000 0 27574 2551 0x00000000
[snip]
Call Trace:
[<ffffffff81aaa21d>] schedule+0xad/0xc0
[<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
[<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
[<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
[<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
[<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
[<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
[<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
[<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
[<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
[<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
[<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
[<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
[<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
[<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
[<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
[<ffffffff812fba4b>] ? iput+0x5b/0x90
[<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
[<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b

It's fine if we turn user quota off at first, then turn off other
kind of quotas if they are enabled since the group/project dquot
refcount is decreased to zero once the user quota if off. Otherwise,
those dquots refcount is non-zero due to the user dquot might refer
to them as hint(s). Hence, above operation cause an infinite loop
at xfs_qm_dquot_walk() while trying to purge dquot cache.

This problem has been around since Linux 3.4, it was introduced by:
[ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

Originally we will release the group dquot pointers because the user
dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
However, with above change, there is no such work to be done before
purging group/project dquot cache.

In order to solve this problem, this patch introduces a special routine
xfs_qm_dqpurge_hints(), and it would release the group/project dquot
pointers the user dquots maybe carrying around as a hint, and then it
will proceed to purge the user dquot cache if requested.

Cc: stable@vger.kernel.org
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 37eb9706 26-Nov-2013 Jie Liu <jeff.liu@oracle.com>

xfs: fix false assertion at xfs_qm_vop_create_dqattach

After the previous fix, there still has another ASSERT failure if turning
off any type of quota while fsstress is running at the same time.

Backtrace in this case:

[ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
[ 50.867924] ------------[ cut here ]------------
... <snip>
[ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
[ 50.867999] invalid opcode: 0000 [#1] SMP
[ 50.869407] Call Trace:
[ 50.869446] [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
[ 50.869512] [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
[ 50.869564] [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
[ 50.869615] [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
[ 50.869655] [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
[ 50.869689] [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
[ 50.869723] [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
[ 50.869757] [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
[ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
[ 50.870003] RIP [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
[ 50.870050] RSP <ffff88002941fd60>
[ 50.879251] ---[ end trace c93a2b342341c65b ]---

We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
however the assertion itself is not right IMHO. While performing quota off, we
firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee
that the desired quota is still active.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# a4fbe6ab 22-Oct-2013 Dave Chinner <dchinner@redhat.com>

xfs: decouple inode and bmap btree header files

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 239880ef 22-Oct-2013 Dave Chinner <dchinner@redhat.com>

xfs: decouple log and transaction headers

xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 9aede1d8 14-Oct-2013 Dave Chinner <dchinner@redhat.com>

xfs: split dquot buffer operations out

Parts of userspace want to be able to read and modify dquot buffers
(e.g. xfs_db) so we need to split out the reading and writing of
these buffers so it is easy to shared code with libxfs in userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 70a9883c 22-Oct-2013 Dave Chinner <dchinner@redhat.com>

xfs: create a shared header file for format-related information

All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# f5e1dd34 27-Aug-2013 Glauber Costa <glommer@gmail.com>

super: fix for destroy lrus

This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
and moves the deletion to after the shrinker is unregistered, as correctly
spotted by Dave

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 5ca302c8 27-Aug-2013 Glauber Costa <glommer@openvz.org>

list_lru: dynamically adjust node arrays

We currently use a compile-time constant to size the node array for the
list_lru structure. Due to this, we don't need to allocate any memory at
initialization time. But as a consequence, the structures that contain
embedded list_lru lists can become way too big (the superblock for
instance contains two of them).

This patch aims at ameliorating this situation by dynamically allocating
the node arrays with the firmware provided nr_node_ids.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 35163417 27-Aug-2013 Dave Chinner <david@fromorbit.com>

xfs: fix dquot isolation hang

The new LRU list isolation code in xfs_qm_dquot_isolate() isn't
completely up to date. Firstly, it needs conversion to return enum
lru_status values, not raw numbers. Secondly - most importantly - it
fails to unlock the dquot and relock the LRU in the LRU_RETRY path.
This leads to deadlocks in xfstests generic/232. Fix them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 2f5b56f8 27-Aug-2013 Andrew Morton <akpm@linux-foundation.org>

xfs-convert-dquot-cache-lru-to-list_lru-fix

fix warnings

Cc: Dave Chinner <dchinner@redhat.com>
Cc: Glauber Costa <glommer@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# cd56a39a 27-Aug-2013 Dave Chinner <dchinner@redhat.com>

xfs: convert dquot cache lru to list_lru

Convert the XFS dquot lru to use the list_lru construct and convert the
shrinker to being node aware.

[glommer@openvz.org: edited for conflicts + warning fixes]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Glauber Costa <glommer@openvz.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 55f841ce 27-Aug-2013 Glauber Costa <glommer@openvz.org>

super: fix calculation of shrinkable objects for small numbers

The sysctl knob sysctl_vfs_cache_pressure is used to determine which
percentage of the shrinkable objects in our cache we should actively try
to shrink.

It works great in situations in which we have many objects (at least more
than 100), because the aproximation errors will be negligible. But if
this is not the case, specially when total_objects < 100, we may end up
concluding that we have no objects at all (total / 100 = 0, if total <
100).

This is certainly not the biggest killer in the world, but may matter in
very low kernel memory situations.

Signed-off-by: Glauber Costa <glommer@openvz.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Rientjes <rientjes@google.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: J. Bruce Fields <bfields@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7aab1b28 15-Aug-2013 Dwight Engen <dwight.engen@oracle.com>

xfs: convert kuid_t to/from uid_t for internal structures

Use uint32 from init_user_ns for xfs internal uid/gid
representation in xfs_icdinode, xfs_dqid_t.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Dwight Engen <dwight.engen@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 3d3c8b52 12-Aug-2013 Jie Liu <jeff.liu@oracle.com>

xfs: refactor xfs_trans_reserve() interface

With the new xfs_trans_res structure has been introduced, the log
reservation size, log count as well as log flags are pre-initialized
at mount time. So it's time to refine xfs_trans_reserve() interface
to be more neat.

Also, introduce a new helper M_RES() to return a pointer to the
mp->m_resv structure to simplify the input.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# e546cb79 12-Aug-2013 Dave Chinner <dchinner@redhat.com>

xfs: consolidate xfs_utils.c

There are a few small helper functions in xfs_util, all related to
xfs_inode modifications. Move them all to xfs_inode.c so all
xfs_inode operations are consiolidated in the one place.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 6ca1c906 12-Aug-2013 Dave Chinner <dchinner@redhat.com>

xfs: separate dquot on disk format definitions out of xfs_quota.h

The on disk format definitions of the on-disk dquot, log formats and
quota off log formats are all intertwined with other definitions for
quotas. Separate them out into their own header file so they can
easily be shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# d892d586 19-Jul-2013 Chandra Seetharaman <sekharan@us.ibm.com>

xfs: Start using pquotaino from the superblock.

Start using pquotino and define a macro to check if the
superblock has pquotino.

Keep backward compatibilty by alowing mount of older superblock
with no separate pquota inode.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 92f8ff73 10-Jul-2013 Chandra Seetharaman <sekharan@us.ibm.com>

xfs: Add pquota fields where gquota is used.

Add project quota changes to all the places where group quota field
is used:
* add separate project quota members into various structures
* split project quota and group quotas so that instead of overriding
the group quota members incore, the new project quota members are
used instead
* get rid of usage of the OQUOTA flag incore, in favor of separate
group and project quota flags.
* add a project dquot argument to various functions.

Not using the pquotino field from superblock yet.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 83e782e1 27-Jun-2013 Chandra Seetharaman <sekharan@us.ibm.com>

xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD

Remove all incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD. Instead,
start using XFS_GQUOTA_.* XFS_PQUOTA_.* counterparts for GQUOTA and
PQUOTA respectively.

On-disk copy still uses XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD.

Read and write of the superblock does the conversion from *OQUOTA*
to *[PG]QUOTA*.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 113a5683 27-Jun-2013 Chandra Seetharaman <sekharan@us.ibm.com>

xfs: Code cleanup and removal of some typedef usage

In preparation for combined pquota/gquota support, for the sake
of readability, do some code cleanup surrounding the affected
code.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 329e0875 27-Jun-2013 Chandra Seetharaman <sekharan@us.ibm.com>

xfs: Replace macro XFS_DQUOT_TREE with a function

In preparation for combined pquota/gquota support, for the sake
of readability, change the macro to an inline function.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 9cad19d2 27-Jun-2013 Chandra Seetharaman <sekharan@us.ibm.com>

xfs: Define a new function xfs_is_quota_inode()

In preparation for combined pquota/gquota support, define
a new function to check if the given inode is a quota inode.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# bb9b8e86 02-Jun-2013 Dave Chinner <dchinner@redhat.com>

xfs: rework dquot CRCs

Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.

Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.

Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs are not calculated on the way
down to disk.

Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.

So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. That
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 6fcdc59de28817d1fbf1bd58cc01f4f3fac858fb)


# 6fcdc59d 02-Jun-2013 Dave Chinner <dchinner@redhat.com>

xfs: rework dquot CRCs

Calculating dquot CRCs when the backing buffer is written back just
doesn't work reliably. There are several places which manipulate
dquots directly in the buffers, and they don't calculate CRCs
appropriately, nor do they always set the buffer up to calculate
CRCs appropriately.

Firstly, if we log a dquot buffer (e.g. during allocation) it gets
logged without valid CRC, and so on recovery we end up with a dquot
that is not valid.

Secondly, if we recover/repair a dquot, we don't have a verifier
attached to the buffer and hence CRCs are not calculated on the way
down to disk.

Thirdly, calculating the CRC after we've changed the contents means
that if we re-read the dquot from the buffer, we cannot verify the
contents of the dquot are valid, as the CRC is invalid.

So, to avoid all the dquot CRC errors that are being detected by the
read verifier, change to using the same model as for inodes. That
is, dquot CRCs are calculated and written to the backing buffer at
the time the dquot is flushed to the backing buffer. If we modify
the dquot directly in the backing buffer, calculate the CRC
immediately after the modification is complete. Hence the dquot in
the on-disk buffer should always have a valid CRC.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 3fe58f30 02-Apr-2013 Christoph Hellwig <hch@lst.de>

xfs: add CRC checks for quota blocks

Use the reserved space in struct xfs_dqblk to store a UUID and a crc
for the quota blocks.

[dchinner@redhat.com] Add a LSN field and update for current verifier
infrastructure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 4b6eae2e 18-Mar-2013 Brian Foster <bfoster@redhat.com>

xfs: pass xfs_dquot to xfs_qm_adjust_dqlimits() instead of xfs_disk_dquot_t

Modify xfs_qm_adjust_dqlimits() to take the xfs_dquot as a
parameter instead of just the xfs_disk_dquot_t so we can update
in-memory fields if necessary.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# f910a8c6 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com>

xfs: calculate xfs_qm_write_sb_changes() space log reservation at mount time

For the transaction that write the incore superblock changes of quota flags
to disk, it would reserve the same log space to clear/reset quota flags
transaction, hence we can use XFS_TRANS_SBCHANGE_LOG_RES() for it as well.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# b870553c 27-Nov-2012 Dave Chinner <dchinner@redhat.com>

xfs: fix stray dquot unlock when reclaiming dquots

When we fail to get a dquot lock during reclaim, we jump to an error
handler that unlocks the dquot. This is wrong as we didn't lock the
dquot, and unlocking it means who-ever is holding the lock has had
it silently taken away, and hence it results in a lock imbalance.

Found by inspection while modifying the code for the numa-lru
patchset. This fixes a random hang I've been seeing on xfstest 232
for the past several months.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 1813dd64 13-Nov-2012 Dave Chinner <dchinner@redhat.com>

xfs: convert buffer verifiers to an ops structure.

To separate the verifiers from iodone functions and associate read
and write verifiers at the same time, introduce a buffer verifier
operations structure to the xfs_buf.

This avoids the need for assigning the write verifier, clearing the
iodone function and re-running ioend processing in the read
verifier, and gets rid of the nasty "b_pre_io" name for the write
verifier function pointer. If we ever need to, it will also be
easier to add further content specific callbacks to a buffer with an
ops structure in place.

We also avoid needing to export verifier functions, instead we
can simply export the ops structures for those that are needed
outside the function they are defined in.

This patch also fixes a directory block readahead verifier issue
it exposed.

This patch also adds ops callbacks to the inode/alloc btree blocks
initialised by growfs. These will need more work before they will
work with CRCs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Phil White <pwhite@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 612cfbfe 13-Nov-2012 Dave Chinner <dchinner@redhat.com>

xfs: add pre-write metadata buffer verifier callbacks

These verifiers are essentially the same code as the read verifiers,
but do not require ioend processing. Hence factor the read verifier
functions and add a new write verifier wrapper that is used as the
callback.

This is done as one large patch for all verifiers rather than one
patch per verifier as the change is largely mechanical. This
includes hooking up the write verifier via the read verifier
function.

Hooking up the write verifier for buffers obtained via
xfs_trans_get_buf() will be done in a separate patch as that touches
code in many different places rather than just the verifier
functions.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# c6319198 13-Nov-2012 Dave Chinner <dchinner@redhat.com>

xfs: verify dquot blocks as they are read from disk

Add a dquot buffer verify callback function and pass it into the
buffer read functions. This checks all the dquots in a buffer, but
cannot completely verify the dquot ids are correct. Also, errors
cannot be repaired, so an additional function is added to repair bad
dquots in the buffer if such an error is detected in a context where
repair is allowed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Phil White <pwhite@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# c3f8fc73 12-Nov-2012 Dave Chinner <dchinner@redhat.com>

xfs: make buffer read verication an IO completion function

Add a verifier function callback capability to the buffer read
interfaces. This will be used by the callers to supply a function
that verifies the contents of the buffer when it is read from disk.
This patch does not provide callback functions, but simply modifies
the interfaces to allow them to be called.

The reason for adding this to the read interfaces is that it is very
difficult to tell fom the outside is a buffer was just read from
disk or whether we just pulled it out of cache. Supplying a callbck
allows the buffer cache to use it's internal knowledge of the buffer
to execute it only when the buffer is read from disk.

It is intended that the verifier functions will mark the buffer with
an EFSCORRUPTED error when verification fails. This allows the
reading context to distinguish a verification error from an IO
error, and potentially take further actions on the buffer (e.g.
attempt repair) based on the error reported.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Phil White <pwhite@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 33479e05 08-Oct-2012 Dave Chinner <dchinner@redhat.com>

xfs: remove xfs_iget.c

The inode cache functions remaining in xfs_iget.c can be moved to xfs_icache.c
along with the other inode cache functions. This removes all functionality from
xfs_iget.c, so the file can simply be removed.

This move results in various functions now only having the scope of a single
file (e.g. xfs_inode_free()), so clean up all the definitions and exported
prototypes in xfs_icache.[ch] and xfs_inode.h appropriately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 32972383 07-Jun-2012 Dave Chinner <dchinner@redhat.com>

xfs: make largest supported offset less shouty

XFS_MAXIOFFSET() is just a simple macro that resolves to
mp->m_maxioffset. It doesn't need to exist, and it just makes the
code unnecessarily loud and shouty.

Make it quiet and easy to read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 60a34607 22-Apr-2012 Dave Chinner <dchinner@redhat.com>

xfs: move xfsagino_t to xfs_types.h

Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 43ff2122 22-Apr-2012 Christoph Hellwig <hch@infradead.org>

xfs: on-stack delayed write buffer lists

Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
and write back the buffers per-process instead of by waking up xfsbufd.

This is now easily doable given that we have very few places left that write
delwri buffers:

- log recovery:
Only done at mount time, and already forcing out the buffers
synchronously using xfs_flush_buftarg

- quotacheck:
Same story.

- dquot reclaim:
Writes out dirty dquots on the LRU under memory pressure. We might
want to look into doing more of this via xfsaild, but it's already
more optimal than the synchronous inode reclaim that writes each
buffer synchronously.

- xfsaild:
This is the main beneficiary of the change. By keeping a local list
of buffers to write we reduce latency of writing out buffers, and
more importably we can remove all the delwri list promotions which
were hitting the buffer cache hard under sustained metadata loads.

The implementation is very straight forward - xfs_buf_delwri_queue now gets
a new list_head pointer that it adds the delwri buffers to, and all callers
need to eventually submit the list using xfs_buf_delwi_submit or
xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
list. The biggest change to pass down the buffer list was done to the AIL
pushing. Now that we operate on buffers the trylock, push and pushbuf log
item methods are merged into a single push routine, which tries to lock the
item, and if possible add the buffer that needs writeback to the buffer list.
This leads to much simpler code than the previous split but requires the
individual IOP_PUSH instances to unlock and reacquire the AIL around calls
to blocking routines.

Given that xfsailds now also handle writing out buffers, the conditions for
log forcing and the sleep times needed some small changes. The most
important one is that we consider an AIL busy as long we still have buffers
to push, and the other one is that we do increment the pushed LSN for
buffers that are under flushing at this moment, but still count them towards
the stuck items for restart purposes. Without this we could hammer on stuck
items without ever forcing the log and not make progress under heavy random
delete workloads on fast flash storage devices.

[ Dave Chinner:
- rebase on previous patches.
- improved comments for XBF_DELWRI_Q handling
- fix XBF_ASYNC handling in queue submission (test 106 failure)
- rename delwri submit function buffer list parameters for clarity
- xfs_efd_item_push() should return XFS_ITEM_PINNED ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# fe7257fd 22-Apr-2012 Christoph Hellwig <hch@infradead.org>

xfs: do not write the buffer from xfs_qm_dqflush

Instead of writing the buffer directly from inside xfs_qm_dqflush return it
to the caller and let the caller decide what to do with the buffer. Also
remove the pincount check in xfs_qm_dqflush that all non-blocking callers
already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY
check that all callers already perform.

[ Dave Chinner: fixed build error cause by missing '{'. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# b4d05e30 27-Mar-2012 Christoph Hellwig <hch@infradead.org>

xfs: avoid taking the ilock unnessecarily in xfs_qm_dqattach

Check if we actually need to attach a dquot before taking the ilock in
xfs_qm_dqattach. This avoid superflous lock roundtrips for the common cases
of quota support compiled in but not activated on a filesystem and an
inode that already has the dquots attached.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# a05931ce 13-Mar-2012 Christoph Hellwig <hch@infradead.org>

xfs: remove the global xfs_Gqm structure

If we initialize the slab caches for the quota code when XFS is loaded there
is no need for a global and reference counted quota manager structure. Drop
all this overhead and also fix the error handling during quota initialization.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# b84a3a96 14-Mar-2012 Christoph Hellwig <hch@lst.de>

xfs: remove the per-filesystem list of dquots

Instead of keeping a separate per-filesystem list of dquots we can walk
the radix tree for the two places where we need to iterate all quota
structures.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 9f920f11 13-Mar-2012 Christoph Hellwig <hch@infradead.org>

xfs: use per-filesystem radix trees for dquot lookup

Replace the global hash tables for looking up in-memory dquot structures
with per-filesystem radix trees to allow scaling to a large number of
in-memory dquot structures.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# f8739c3c 13-Mar-2012 Christoph Hellwig <hch@infradead.org>

xfs: per-filesystem dquot LRU lists

Replace the global dquot lru lists with a per-filesystem one.

Note that the shrinker isn't wire up to the per-superblock VFS shrinker
infrastructure as would have problems summing up and splitting the counts
for inodes and dquots. I don't think this is a major problem as the quota
cache isn't as interwinded with the inode cache as the dentry cache is,
because an inode that is dropped from the cache will generally release
a dquot reference, but most of the time it won't be the last one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 48776fd2 13-Mar-2012 Christoph Hellwig <hch@infradead.org>

xfs: use common code for quota statistics

Switch the quota code over to use the generic XFS statistics infrastructure.
While the legacy /proc/fs/xfs/xqm and /proc/fs/xfs/xqmstats interfaces are
preserved for now the statistics that still have a meaning with the current
code are now also available from /proc/fs/xfs/stats.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 92b2e5b3 01-Feb-2012 Christoph Hellwig <hch@infradead.org>

xfs: use a normal shrinker for the dquot freelist

Stop reusing dquots from the freelist when allocating new ones directly, and
implement a shrinker that actually follows the specifications for the
interface. The shrinker implementation is still highly suboptimal at this
point, but we can gradually work on it.

This also fixes an bug in the previous lock ordering, where we would take
the hash and dqlist locks inside of the freelist lock against the normal
lock ordering. This is only solvable by introducing the dispose list,
and thus not when using direct reclaim of unused dquots for new allocations.

As a side-effect the quota upper bound and used to free ratio values in
/proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
new world order.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 04da0c8196ac0b12fb6b84f4b7a51ad2fa56d869)


# 04da0c81 01-Feb-2012 Christoph Hellwig <hch@infradead.org>

xfs: use a normal shrinker for the dquot freelist

Stop reusing dquots from the freelist when allocating new ones directly, and
implement a shrinker that actually follows the specifications for the
interface. The shrinker implementation is still highly suboptimal at this
point, but we can gradually work on it.

This also fixes an bug in the previous lock ordering, where we would take
the hash and dqlist locks inside of the freelist lock against the normal
lock ordering. This is only solvable by introducing the dispose list,
and thus not when using direct reclaim of unused dquots for new allocations.

As a side-effect the quota upper bound and used to free ratio values in
/proc/fs/xfs/xqm are set to 0 as these values don't make any sense in the
new world order.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 4177af3a 23-Jan-2012 Chandra Seetharaman <sekharan@us.ibm.com>

Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usage

Define new macro XFS_ALL_QUOTA_ACTIVE and simply some usage
of quota macros.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 28fb588c 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: simplify xfs_qm_detach_gdquots

There is no reason to drop qi_dqlist_lock around calls to xfs_qm_dqrele
because the free list lock now nests inside qi_dqlist_lock and the
dquot lock.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 7ae44407 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: remove XFS_QMOPT_DQSUSER

Just read the id 0 dquot from disk directly in xfs_qm_init_quotainfo instead
of going through dqget and requiring a special flag to not add the dquot to
any lists.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 78e55892 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: add a xfs_dqhold helper

Factor the common pattern of:

xfs_dqlock(dqp);
XFS_DQHOLD(dqp);
xfs_dqunlock(dqp);

into a new helper, and remove XFS_DQHOLD now that only one other caller
is left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# ab680bb7 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: simplify xfs_qm_dqattach_grouphint

No need to play games with the qlock now that the freelist lock nests inside
it. Also clean up various outdated comments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# bf72de31 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: nest qm_dqfrlist_lock inside the dquot qlock

Allow xfs_qm_dqput to work without trylock loops by nesting the freelist lock
inside the dquot qlock. In turn that requires trylocks in the reclaim path
instead, but given it's a classic tradeoff between fast and slow path, and
we follow the model of the inode and dentry caches.

Document our new lock order now that it has settled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 92678554 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: flatten the dquot lock ordering

Introduce a new XFS_DQ_FREEING flag that tells lookup and mplist walks
to skip a dquot that is beeing freed, and use this avoid the trylock
on the hash and mplist locks in xfs_qm_dqreclaim_one. Also simplify
xfs_dqpurge by moving the inodes to a dispose list after marking them
XFS_DQ_FREEING and avoid the locker ordering constraints.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# be7ffc38 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: implement lazy removal for the dquot freelist

Do not remove dquots from the freelist when we grab a reference to them in
xfs_qm_dqlookup, but leave them on the freelist util scanning notices that
they have a reference. This speeds up the lookup fastpath, and greatly
simplifies the lock ordering constraints. Note that the same scheme is
used by the VFS inode and dentry caches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 80a376bf 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: remove XFS_DQ_INACTIVE

Free dquots when purging them during umount instead of keeping them around
on the freelist in a degraded state. The out of order locking in
xfs_qm_dqpurge will be removed again later in this series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 800b484e 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: cleanup dquot locking helpers

Mark the trivial lock wrappers as inline, and make the naming consistent
for all of them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# a7ef9bd7 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: remove the sync_mode argument to xfs_qm_dqflush_all

It always is zero, and removing it will make future changes easier.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# 34625c66 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: remove xfs_qm_sync

Now that we can't have any dirty dquots around that aren't in the AIL we
can get rid of the explicit dquot syncing from xfssyncd and xfs_fs_sync_fs
and instead rely on AIL pushing to write out any quota updates.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# f2fba558 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: make sure to really flush all dquots in xfs_qm_quotacheck

Make sure we do not skip any dquots when flushing them out after a
quotacheck to make sure that we will never have any dirty dquots on a
live filesystem. At this point no dquot should be pinnable, but lets
be pedantic about it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# fdedf28b 06-Dec-2011 Christoph Hellwig <hch@infradead.org>

xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush

Only skip pinned dquots if SYNC_TRYLOCK is specified, and adjust the callers
to keep the behaviour unchanged.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>


# db3e74b5 09-Nov-2011 Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>

xfs: use doalloc flag in xfs_qm_dqattach_one()

The doalloc arg in xfs_qm_dqattach_one() is a flag that indicates
whether a new area to handle quota information will be allocated
if needed. Originally, it was passed to xfs_qm_dqget(), but has
been removed by the following commit (probably by mistake):

commit 8e9b6e7fa4544ea8a0e030c8987b918509c8ff47
Author: Christoph Hellwig <hch@lst.de>
Date: Sun Feb 8 21:51:42 2009 +0100

xfs: remove the unused XFS_QMOPT_DQLOCK flag

As the result, xfs_qm_dqget() called from xfs_qm_dqattach_one()
never allocates the new area even if it is needed.

This patch gives the doalloc arg to xfs_qm_dqget() in
xfs_qm_dqattach_one() to fix this problem.

Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>


# a9add83e 10-Oct-2011 Christoph Hellwig <hch@infradead.org>

xfs: remove XFS_bflush

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>


# 5c8ed202 18-Sep-2011 Dave Chinner <dchinner@redhat.com>

xfs: introduce xfs_bmapi_read()

xfs_bmapi() currently handles both extent map reading and
allocation. As a result, the code is littered with "if (wr)"
branches to conditionally do allocation operations if required.
This makes the code much harder to follow and causes significant
indent issues with the code.

Given that read mapping is much simpler than allocation, we can
split out read mapping from xfs_bmapi() and reuse the logic that
we have already factored out do do all the hard work of handling the
extent map manipulations. The results in a much simpler function for
the common extent read operations, and will allow the allocation
code to be simplified in another commit.

Once xfs_bmapi_read() is implemented, convert all the callers of
xfs_bmapi() that are only reading extents to use the new function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>


# 61551f1e 23-Aug-2011 Christoph Hellwig <hch@infradead.org>

xfs: call xfs_buf_delwri_queue directly

Unify the ways we add buffers to the delwri queue by always calling
xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and
opencoded in its callers, and the two places setting XBF_DELWRI while a
buffer is locked and expecting xfs_buf_unlock to pick it up are converted
to call xfs_buf_delwri_queue directly, too. Also replace the
XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
to make the explicit queuing/dequeuing more obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>


# c59d87c4 12-Aug-2011 Christoph Hellwig <hch@infradead.org>

xfs: remove subdirectories

Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
annoying subdirectories in the XFS source code. Besides the large
amount of file rename the only changes are to the Makefile, a few
files including headers with the subdirectory prefix, and the binary
sysctl compat code that includes a header under fs/xfs/ from
kernel/.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>