History log of /linux-master/fs/notify/mark.c
Revision Date Author Comments
# 7232522e 30-Nov-2023 Amir Goldstein <amir73il@gmail.com>

fanotify: store fsid in mark instead of in connector

Some filesystems like fuse and nfs have zero or non-unique fsid.
We would like to avoid reporting ambiguous fsid in events, so we need
to avoid marking objects with same fsid and different sb.

To make this easier to enforce, store the fsid in the marks of the group
instead of in the shared conenctor.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20231130165619.3386452-2-amir73il@gmail.com>


# c3638b5b 22-Apr-2022 Amir Goldstein <amir73il@gmail.com>

fsnotify: allow adding an inode mark without pinning inode

fsnotify_add_mark() and variants implicitly take a reference on inode
when attaching a mark to an inode.

Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF.

Instead of taking the inode reference when attaching connector to inode
and dropping the inode reference when detaching connector from inode,
take the inode reference on attach of the first mark that wants to hold
an inode reference and drop the inode reference on detach of the last
mark that wants to hold an inode reference.

Backends can "upgrade" an existing mark to take an inode reference, but
cannot "downgrade" a mark with inode reference to release the refernce.

This leaves the choice to the backend whether or not to pin the inode
when adding an inode mark.

This is intended to be used when adding a mark with ignored mask that is
used for optimization in cases where group can afford getting unneeded
events and reinstate the mark with ignored mask when inode is accessed
again after being evicted.

Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 43b245a7 22-Apr-2022 Amir Goldstein <amir73il@gmail.com>

fsnotify: create helpers for group mark_mutex lock

Create helpers to take and release the group mark_mutex lock.

Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines
if the mark_mutex lock is fs reclaim safe or not. If not safe, the
lock helpers take the lock and disable direct fs reclaim.

In that case we annotate the mutex with a different lockdep class to
express to lockdep that an allocation of mark of an fs reclaim safe group
may take the group lock of another "NOFS" group to evict inodes.

For now, converted only the callers in common code and no backend
defines the NOFS flag. It is intended to be set by fanotify for
evictable marks support.

Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# f3010343 22-Apr-2022 Amir Goldstein <amir73il@gmail.com>

fsnotify: make allow_dups a property of the group

Instead of passing the allow_dups argument to fsnotify_add_mark()
as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express
the allow_dups behavior and set this behavior at group creation time
for all calls of fsnotify_add_mark().

Rename the allow_dups argument to generic add_flags argument for future
use.

Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 623af4f5 22-Apr-2022 Amir Goldstein <amir73il@gmail.com>

fsnotify: fix wrong lockdep annotations

Commit 6960b0d909cd ("fsnotify: change locking order") changed some
of the mark_mutex locks in direct reclaim path to use:
mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);

This change is explained:
"...It uses nested locking to avoid deadlock in case we do the final
iput() on an inode which still holds marks and thus would take the
mutex again when calling fsnotify_inode_delete() in destroy_inode()."

The problem is that the mutex_lock_nested() is not a nested lock at
all. In fact, it has the opposite effect of preventing lockdep from
warning about a very possible deadlock.

Due to these wrong annotations, a deadlock that was introduced with
nfsd filecache in kernel v5.4 went unnoticed in v5.4.y for over two
years until it was reported recently by Khazhismel Kumykov, only to
find out that the deadlock was already fixed in kernel v5.5.

Fix the wrong lockdep annotations.

Cc: Khazhismel Kumykov <khazhy@google.com>
Fixes: 6960b0d909cd ("fsnotify: change locking order")
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Link: https://lore.kernel.org/r/20220422120327.3459282-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 4f0b903d 23-Feb-2022 Amir Goldstein <amir73il@gmail.com>

fsnotify: fix merge with parent's ignored mask

fsnotify_parent() does not consider the parent's mark at all unless
the parent inode shows interest in events on children and in the
specific event.

So unless parent added an event to both its mark mask and ignored mask,
the event will not be ignored.

Fix this by declaring the interest of an object in an event when the
event is in either a mark mask or ignored mask.

Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 1c9007d6 29-Nov-2021 Amir Goldstein <amir73il@gmail.com>

fsnotify: separate mark iterator type from object type enum

They are two different types that use the same enum, so this confusing.

Use the object type to indicate the type of object mark is attached to
and the iter type to indicate the type of watch.

A group can have two different watches of the same object type (parent
and child watches) that match the same event.

Link: https://lore.kernel.org/r/20211129201537.1932819-3-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# ad69cd99 29-Nov-2021 Amir Goldstein <amir73il@gmail.com>

fsnotify: clarify object type argument

In preparation for separating object type from iterator type, rename
some 'type' arguments in functions to 'obj_type' and remove the unused
interface to clear marks by object type mask.

Link: https://lore.kernel.org/r/20211129201537.1932819-2-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 4396a731 09-Sep-2021 Amir Goldstein <amir73il@gmail.com>

fsnotify: fix sb_connectors leak

Fix a leak in s_fsnotify_connectors counter in case of a race between
concurrent add of new fsnotify mark to an object.

The task that lost the race fails to drop the counter before freeing
the unused connector.

Following umount() hangs in fsnotify_sb_delete()/wait_var_event(),
because s_fsnotify_connectors never drops to zero.

Fixes: ec44610fe2b8 ("fsnotify: count all objects with attached connectors")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/linux-fsdevel/20210907063338.ycaw6wvhzrfsfdlp@xzhoux.usersys.redhat.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ec44610f 10-Aug-2021 Amir Goldstein <amir73il@gmail.com>

fsnotify: count all objects with attached connectors

Rename s_fsnotify_inode_refs to s_fsnotify_connectors and count all
objects with attached connectors, not only inodes with attached
connectors.

This will be used to optimize fsnotify() calls on sb without any
type of marks.

Link: https://lore.kernel.org/r/20210810151220.285179-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 11fa333b 10-Aug-2021 Amir Goldstein <amir73il@gmail.com>

fsnotify: count s_fsnotify_inode_refs for attached connectors

Instead of incrementing s_fsnotify_inode_refs when detaching connector
from inode, increment it earlier when attaching connector to inode.
Next patch is going to use s_fsnotify_inode_refs to count all objects
with attached connectors.

Link: https://lore.kernel.org/r/20210810151220.285179-3-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 09ddbe69 10-Aug-2021 Amir Goldstein <amir73il@gmail.com>

fsnotify: replace igrab() with ihold() on attach connector

We must have a reference on inode, so ihold is cheaper.

Link: https://lore.kernel.org/r/20210810151220.285179-2-amir73il@gmail.com
Reviewed-by: Matthew Bobrowski <repnop@google.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 5b8fea65 04-Mar-2021 Amir Goldstein <amir73il@gmail.com>

fanotify: configurable limits via sysfs

fanotify has some hardcoded limits. The only APIs to escape those limits
are FAN_UNLIMITED_QUEUE and FAN_UNLIMITED_MARKS.

Allow finer grained tuning of the system limits via sysfs tunables under
/proc/sys/fs/fanotify, similar to tunables under /proc/sys/fs/inotify,
with some minor differences.

- max_queued_events - global system tunable for group queue size limit.
Like the inotify tunable with the same name, it defaults to 16384 and
applies on initialization of a new group.

- max_user_marks - user ns tunable for marks limit per user.
Like the inotify tunable named max_user_watches, on a machine with
sufficient RAM and it defaults to 1048576 in init userns and can be
further limited per containing user ns.

- max_user_groups - user ns tunable for number of groups per user.
Like the inotify tunable named max_user_instances, it defaults to 128
in init userns and can be further limited per containing user ns.

The slightly different tunable names used for fanotify are derived from
the "group" and "mark" terminology used in the fanotify man pages and
throughout the code.

Considering the fact that the default value for max_user_instances was
increased in kernel v5.10 from 8192 to 1048576, leaving the legacy
fanotify limit of 8192 marks per group in addition to the max_user_marks
limit makes little sense, so the per group marks limit has been removed.

Note that when a group is initialized with FAN_UNLIMITED_MARKS, its own
marks are not accounted in the per user marks account, so in effect the
limit of max_user_marks is only for the collection of groups that are
not initialized with FAN_UNLIMITED_MARKS.

Link: https://lore.kernel.org/r/20210304112921.3996419-2-amir73il@gmail.com
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 00e0afb6 13-Apr-2020 Jules Irenge <jbi.octave@gmail.com>

fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()

Sparse reports warnings at fsnotify_prepare_user_wait()
and at fsnotify_finish_user_wait()

warning: context imbalance in fsnotify_finish_user_wait()
- wrong count at exit
warning: context imbalance in fsnotify_prepare_user_wait()
- unexpected unlock

The root cause is the missing annotation at fsnotify_finish_user_wait()
and at fsnotify_prepare_user_wait()
fsnotify_prepare_user_wait() has an extra annotation __release()
that only tell Sparse and not GCC to shutdown the warning

Add the missing __acquires(&fsnotify_mark_srcu) annotation
Add the missing __releases(&fsnotify_mark_srcu) annotation
Add the __release(&fsnotify_mark_srcu) annotation.

Link: https://lore.kernel.org/r/20200413214240.15245-1-jbi.octave@gmail.com
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# b72679ee 18-Aug-2019 Trond Myklebust <trond.myklebust@primarydata.com>

notify: export symbols for use by the knfsd file cache

The knfsd file cache will need to detect when files are unlinked, so that
it can close the associated cached files. Export a minimal set of notifier
functions to allow it to do so.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>


# c285a2f0 19-Jun-2019 Amir Goldstein <amir73il@gmail.com>

fanotify: update connector fsid cache on add mark

When implementing connector fsid cache, we only initialized the cache
when the first mark added to object was added by FAN_REPORT_FID group.
We forgot to update conn->fsid when the second mark is added by
FAN_REPORT_FID group to an already attached connector without fsid
cache.

Reported-and-tested-by: syzbot+c277e8e2f46414645508@syzkaller.appspotmail.com
Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# c82ee6d3 19-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 18

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 or at your option any
later version this program is distributed in the hope that it will
be useful but without any warranty without even the implied warranty
of merchantability or fitness for a particular purpose see the gnu
general public license for more details you should have received a
copy of the gnu general public license along with this program see
the file copying if not write to the free software foundation 675
mass ave cambridge ma 02139 usa

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 52 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154042.342335923@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 11a6f8e2 01-May-2019 Jan Kara <jack@suse.cz>

fsnotify: Clarify connector assignment in fsnotify_add_mark_list()

Add a comment explaining why WRITE_ONCE() is enough when setting
mark->connector which can get dereferenced by RCU protected readers.

Signed-off-by: Jan Kara <jack@suse.cz>


# b1da6a51 24-Apr-2019 Jan Kara <jack@suse.cz>

fsnotify: Fix NULL ptr deref in fanotify_get_fsid()

fanotify_get_fsid() is reading mark->connector->fsid under srcu. It can
happen that it sees mark not fully initialized or mark that is already
detached from the object list. In these cases mark->connector
can be NULL leading to NULL ptr dereference. Fix the problem by
being careful when reading mark->connector and check it for being NULL.
Also use WRITE_ONCE when writing the mark just to prevent compiler from
doing something stupid.

Reported-by: syzbot+15927486a4f1bfcbaf91@syzkaller.appspotmail.com
Fixes: 77115225acc6 ("fanotify: cache fsid in fsnotify_mark_connector")
Signed-off-by: Jan Kara <jack@suse.cz>


# 77115225 10-Jan-2019 Amir Goldstein <amir73il@gmail.com>

fanotify: cache fsid in fsnotify_mark_connector

For FAN_REPORT_FID, we need to encode fid with fsid of the filesystem on
every event. To avoid having to call vfs_statfs() on every event to get
fsid, we store the fsid in fsnotify_mark_connector on the first time we
add a mark and on handle event we use the cached fsid.

Subsequent calls to add mark on the same object are expected to pass the
same fsid, so the call will fail on cached fsid mismatch.

If an event is reported on several mark types (inode, mount, filesystem),
all connectors should already have the same fsid, so we use the cached
fsid from the first connector.

[JK: Simplify code flow around fanotify_get_fid()
make fsid argument of fsnotify_add_mark_locked() unconditional]

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 721fb6fb 17-Oct-2018 Jan Kara <jack@suse.cz>

fsnotify: Fix busy inodes during unmount

Detaching of mark connector from fsnotify_put_mark() can race with
unmounting of the filesystem like:

CPU1 CPU2
fsnotify_put_mark()
spin_lock(&conn->lock);
...
inode = fsnotify_detach_connector_from_object(conn)
spin_unlock(&conn->lock);
generic_shutdown_super()
fsnotify_unmount_inodes()
sees connector detached for inode
-> nothing to do
evict_inode()
barfs on pending inode reference
iput(inode);

Resulting in "Busy inodes after unmount" message and possible kernel
oops. Make fsnotify_unmount_inodes() properly wait for outstanding inode
references from detached connectors.

Note that the accounting of outstanding inode references in the
superblock can cause some cacheline contention on the counter. OTOH it
happens only during deletion of the last notification mark from an inode
(or during unlinking of watched inode) and that is not too bad. I have
measured time to create & delete inotify watch 100000 times from 64
processes in parallel (each process having its own inotify group and its
own file on a shared superblock) on a 64 CPU machine. Average and
standard deviation of 15 runs look like:

Avg Stddev
Vanilla 9.817400 0.276165
Fixed 9.710467 0.228294

So there's no statistically significant difference.

Fixes: 6b3f05d24d35 ("fsnotify: Detach mark from object list when last reference is dropped")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>


# 1e6cb723 01-Sep-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: add super block object type

Add the infrastructure to attach a mark to a super_block struct
and detach all attached marks when super block is destroyed.

This is going to be used by fanotify backend to setup super block
marks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# d3bc0fa8 20-Aug-2018 Jan Kara <jack@suse.cz>

fsnotify: fix false positive warning on inode delete

When inode is getting deleted and someone else holds reference to a mark
attached to the inode, we just detach the connector from the inode. In
that case fsnotify_put_mark() called from fsnotify_destroy_marks() will
decide to recalculate mask for the inode and __fsnotify_recalc_mask()
will WARN about invalid connector type:

WARNING: CPU: 1 PID: 12015 at fs/notify/mark.c:139
__fsnotify_recalc_mask+0x2d7/0x350 fs/notify/mark.c:139

Actually there's no reason to warn about detached connector in
__fsnotify_recalc_mask() so just silently skip updating the mask in such
case.

Reported-by: syzbot+c34692a51b9a6ca93540@syzkaller.appspotmail.com
Fixes: 3ac70bfcde81 ("fsnotify: add helper to get mask from connector")
Signed-off-by: Jan Kara <jack@suse.cz>


# 3ac70bfc 23-Jun-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: add helper to get mask from connector

Use a helper to get the mask from the object (i.e. i_fsnotify_mask)
to generalize code of add/remove inode/vfsmount mark.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 36f10f55 23-Jun-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: let connector point to an abstract object

Make the code to attach/detach a connector to object more generic
by letting the fsnotify connector point to an abstract fsnotify_connp_t.
Code that needs to dereference an inode or mount object now uses the
helpers fsnotify_conn_{inode,mount}.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# b812a9f5 23-Jun-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: pass connp and object type to fsnotify_add_mark()

Instead of passing inode and vfsmount arguments to fsnotify_add_mark()
and its _locked variant, pass an abstract object pointer and the object
type.

The helpers fsnotify_obj_{inode,mount} are added to get the concrete
object pointer from abstract object pointer.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 9b6e5434 23-Jun-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: use typedef fsnotify_connp_t for brevity

The object marks manipulation functions fsnotify_destroy_marks()
fsnotify_find_mark() and their helpers take an argument of type
struct fsnotify_mark_connector __rcu ** to dereference the connector
pointer. use a typedef to describe this type for brevity.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 47d9c7cc 20-Apr-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: generalize iteration of marks by object type

Make some code that handles marks of object types inode and vfsmount
generic, so it can handle other object types.

Introduce fsnotify_foreach_obj_type macro to iterate marks by object type
and fsnotify_iter_{should|set}_report_type macros to set/test report_mask.

This is going to be used for adding mark of another object type
(super block mark).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# d6f7b98b 20-Apr-2018 Amir Goldstein <amir73il@gmail.com>

fsnotify: use type id to identify connector object type

An fsnotify_mark_connector is referencing a single type of object
(either inode or vfsmount). Instead of storing a type mask in
connector->flags, store a single type id in connector->type to
identify the type of object.

When a connector object is detached from the object, its type is set
to FSNOTIFY_OBJ_TYPE_DETACHED and this object is not going to be
reused.

The function fsnotify_clear_marks_by_group() is the only place where
type mask was used, so use type flags instead of type id to this
function.

This change is going to be more convenient when adding a new object
type (super block).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# ab97f873 20-Oct-2017 Elena Reshetova <elena.reshetova@intel.com>

fsnotify: convert fsnotify_mark.refcnt from atomic_t to refcount_t

atomic_t variables are currently used to implement reference
counters with the following properties:
- counter is initialized to 1 using atomic_set()
- a resource is freed upon counter reaching zero
- once counter reaches zero, its further
increments aren't allowed
- counter schema uses basic atomic operations
(set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable fsnotify_mark.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 9a31d7ad 30-Oct-2017 Miklos Szeredi <mszeredi@redhat.com>

fsnotify: fix pinning group in fsnotify_prepare_user_wait()

Blind increment of group's user_waits is not enough, we could be far enough
in the group's destruction that it isn't taken into account (i.e. grabbing
the mark ref afterwards doesn't guarantee that it was the ref coming from
the _group_ that was grabbed).

Instead we need to check (under lock) that the mark is still attached to
the group after having obtained a ref to the mark. If not, skip it.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9385a84d7e1f ("fsnotify: Pass fsnotify_iter_info into handle_event handler")
Cc: <stable@vger.kernel.org> # v4.12
Signed-off-by: Jan Kara <jack@suse.cz>


# 24c20305 30-Oct-2017 Miklos Szeredi <mszeredi@redhat.com>

fsnotify: clean up fsnotify_prepare/finish_user_wait()

This patch doesn't actually fix any bug, just paves the way for fixing mark
and group pinning.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.12
Signed-off-by: Jan Kara <jack@suse.cz>


# 9cf90cef 31-Oct-2017 Jan Kara <jack@suse.cz>

fsnotify: Protect bail out path of fsnotify_add_mark_locked() properly

When fsnotify_add_mark_locked() fails it cleans up the mark it was
adding. Since the mark is already visible in group's list, we should
protect update of mark->flags with mark->lock. I'm not aware of any real
issues this could cause (since we also hold group->mark_mutex) but
better be safe and obey locking rules properly.

Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# f4edce1a 21-Apr-2017 Dan Carpenter <dan.carpenter@oracle.com>

fsnotify: remove a stray unlock

We recently shifted this code around, so we're no longer holding the
lock on this path.

Fixes: 755b5bc681eb ("fsnotify: Remove indirection from mark list addition")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 054c636e 21-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Move ->free_mark callback to fsnotify_ops

Pointer to ->free_mark callback unnecessarily occupies one long in each
fsnotify_mark although they are the same for all marks from one
notification group. Move the callback pointer to fsnotify_ops.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 7b129323 21-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Add group pointer in fsnotify_init_mark()

Currently we initialize mark->group only in fsnotify_add_mark_lock().
However we will need to access fsnotify_ops of corresponding group from
fsnotify_put_mark() so we need mark->group initialized earlier. Do that
in fsnotify_init_mark() which has a consequence that once
fsnotify_init_mark() is called on a mark, the mark has to be destroyed
by fsnotify_put_mark().

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 2e37c6ca 04-Jan-2017 Jan Kara <jack@suse.cz>

fsnotify: Remove fsnotify_detach_group_marks()

The function is already mostly contained in what
fsnotify_clear_marks_by_group() does. Just update that function to not
select marks when all of them should be destroyed and remove
fsnotify_detach_group_marks().

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 18f2e0d3 04-Jan-2017 Jan Kara <jack@suse.cz>

fsnotify: Rename fsnotify_clear_marks_by_group_flags()

The _flags() suffix in the function name was more confusing than
explaining so just remove it. Also rename the argument from 'flags' to
'type' to better explain what the function expects.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 66d2b81b 21-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Remove fsnotify_set_mark_{,ignored_}mask_locked()

These helpers are now only a simple assignment and just obfuscate
what is going on. Remove them.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# abc77577 10-Nov-2016 Jan Kara <jack@suse.cz>

fsnotify: Provide framework for dropping SRCU lock in ->handle_event

fanotify wants to drop fsnotify_mark_srcu lock when waiting for response
from userspace so that the whole notification subsystem is not blocked
during that time. This patch provides a framework for safely getting
mark reference for a mark found in the object list which pins the mark
in that list. We can then drop fsnotify_mark_srcu, wait for userspace
response and then safely continue iteration of the object list once we
reaquire fsnotify_mark_srcu.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# f09b04a0 21-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Remove special handling of mark destruction on group shutdown

Currently we queue all marks for destruction on group shutdown and then
destroy them from fsnotify_destroy_group() instead from a worker thread
which is the usual path. However worker can already be processing some
list of marks to destroy so this does not make 100% all marks are really
destroyed by the time group is shut down. This isn't a big problem as
each mark holds group reference and thus group stays partially alive
until all marks are really freed but there's no point in complicating
our lives - just wait for the delayed work to be finished instead.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 6b3f05d2 20-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Detach mark from object list when last reference is dropped

Instead of removing mark from object list from fsnotify_detach_mark(),
remove the mark when last reference to the mark is dropped. This will
allow fanotify to wait for userspace response to event without having to
hold onto fsnotify_mark_srcu.

To avoid pinning inodes by elevated refcount (and thus e.g. delaying
file deletion) while someone holds mark reference, we detach connector
from the object also from fsnotify_destroy_marks() and not only after
removing last mark from the list as it was now.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 11375145 09-Nov-2016 Jan Kara <jack@suse.cz>

fsnotify: Move queueing of mark for destruction into fsnotify_put_mark()

Currently we queue mark into a list of marks for destruction in
__fsnotify_free_mark() and keep the last mark reference dangling. After the
worker waits for SRCU period, it drops the last reference to the mark
which frees it. This scheme has the disadvantage that if we hold
reference to a mark and drop and reacquire SRCU lock, the mark can get
freed immediately which is slightly inconvenient and we will need to
avoid this in the future.

Move to a scheme where queueing of mark into a list of marks for
destruction happens when the last reference to the mark is dropped. Also
drop reference to the mark held by group list already when mark is
removed from that list instead of dropping it only from the destruction
worker.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 08991e83 01-Feb-2017 Jan Kara <jack@suse.cz>

fsnotify: Free fsnotify_mark_connector when there is no mark attached

Currently we free fsnotify_mark_connector structure only when inode /
vfsmount is getting freed. This can however impose noticeable memory
overhead when marks get attached to inodes only temporarily. So free the
connector structure once the last mark is detached from the object.
Since notification infrastructure can be working with the connector
under the protection of fsnotify_mark_srcu, we have to be careful and
free the fsnotify_mark_connector only after SRCU period passes.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 04662cab 01-Feb-2017 Jan Kara <jack@suse.cz>

fsnotify: Lock object list with connector lock

So far list of marks attached to an object (inode / vfsmount) was
protected by i_lock or mnt_root->d_lock. This dictates that the list
must be empty before the object can be destroyed although the list is
now anchored in the fsnotify_mark_connector structure. Protect the list
by a spinlock in the fsnotify_mark_connector structure to decouple
lifetime of a list of marks from a lifetime of the object. This also
simplifies the code quite a bit since we don't have to differentiate
between inode and vfsmount lists in quite a few places anymore.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 2629718d 15-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Remove useless list deletion and comment

After removing all the indirection it is clear that

hlist_del_init_rcu(&mark->obj_list);

in fsnotify_destroy_marks() is not needed as the mark gets removed from
the list shortly afterwards in fsnotify_destroy_mark() ->
fsnotify_detach_mark() -> fsnotify_detach_from_object(). Also there is
no problem with mark being visible on object list while we call
fsnotify_destroy_mark() as parallel destruction of marks from several
places is properly handled (as mentioned in the comment in
fsnotify_destroy_marks(). So just remove the list removal and also the
stale comment.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 73cd3c33 30-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Avoid double locking in fsnotify_detach_from_object()

We lock object list lock in fsnotify_detach_from_object() twice - once
to detach mark and second time to recalculate mask. That is unnecessary
and later it will become problematic as we will free the connector as
soon as there is no mark in it. So move recalculation of fsnotify mask
into the same critical section that is detaching mark.

This also removes recalculation of child dentry flags from
fsnotify_detach_from_object(). That is however fine. Those marks will
get recalculated once some event happens on a child.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 8212a609 15-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Remove indirection from fsnotify_detach_mark()

fsnotify_detach_mark() calls fsnotify_destroy_inode_mark() or
fsnotify_destroy_vfsmount_mark() to remove mark from object list. These
two functions are however very similar and differ only in the lock they
use to protect the object list of marks. Simplify the code by removing
the indirection and removing mark from the object list in a common
function.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# a03e2e4f 15-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Determine lock in fsnotify_destroy_marks()

Instead of passing spinlock into fsnotify_destroy_marks() determine it
directly in that function from the connector type. This will reduce code
churn when changing lock protecting list of marks.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# f06fd987 15-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Move locking into fsnotify_find_mark()

Move locking of a mark list into fsnotify_find_mark(). This reduces code
churn in the following patch changing lock protecting the list.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# a242677b 15-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Move locking into fsnotify_recalc_mask()

Move locking of locks protecting a list of marks into
fsnotify_recalc_mask(). This reduces code churn in the following patch
which changes the lock protecting the list of marks.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 0810b4f9 01-Feb-2017 Jan Kara <jack@suse.cz>

fsnotify: Move fsnotify_destroy_marks()

Move fsnotify_destroy_marks() to be later in the fs/notify/mark.c. It
will need some functions that are declared after its current
declaration. No functional change.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 755b5bc6 14-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Remove indirection from mark list addition

Adding notification mark to object list has been currently done through
fsnotify_add_{inode|vfsmount}_mark() helpers from
fsnotify_add_mark_locked() which call fsnotify_add_mark_list(). Remove
this unnecessary indirection to simplify the code.

Pushing all the locking to fsnotify_add_mark_list() also allows us to
allocate the connector structure with GFP_KERNEL mode.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# e911d8af 14-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Make fsnotify_mark_connector hold inode reference

Currently inode reference is held by fsnotify marks. Change the rules so
that inode reference is held by fsnotify_mark_connector structure
whenever the list is non-empty. This simplifies the code and is more
logical.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 86ffe245 14-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Move object pointer to fsnotify_mark_connector

Move pointer to inode / vfsmount from mark itself to the
fsnotify_mark_connector structure. This is another step on the path
towards decoupling inode / vfsmount lifetime from notification mark
lifetime.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# 9dd813c1 13-Mar-2017 Jan Kara <jack@suse.cz>

fsnotify: Move mark list head from object into dedicated structure

Currently notification marks are attached to object (inode or vfsmnt) by
a hlist_head in the object. The list is also protected by a spinlock in
the object. So while there is any mark attached to the list of marks,
the object must be pinned in memory (and thus e.g. last iput() deleting
inode cannot happen). Also for list iteration in fsnotify() to work, we
must hold fsnotify_mark_srcu lock so that mark itself and
mark->obj_list.next cannot get freed. Thus we are required to wait for
response to fanotify events from userspace process with
fsnotify_mark_srcu lock held. That causes issues when userspace process
is buggy and does not reply to some event - basically the whole
notification subsystem gets eventually stuck.

So to be able to drop fsnotify_mark_srcu lock while waiting for
response, we have to pin the mark in memory and make sure it stays in
the object list (as removing the mark waiting for response could lead to
lost notification events for groups later in the list). However we don't
want inode reclaim to block on such mark as that would lead to system
just locking up elsewhere.

This commit is the first in the series that paves way towards solving
these conflicting lifetime needs. Instead of anchoring the list of marks
directly in the object, we anchor it in a dedicated structure
(fsnotify_mark_connector) and just point to that structure from the
object. The following commits will also add spinlock protecting the list
and object pointer to the structure.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# c1f33073 16-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Update comments

Add a comment that lifetime of a notification mark is protected by SRCU
and remove a comment about clearing of marks attached to the inode. It
is stale and more uptodate version is at fsnotify_destroy_marks() which
is the function handling this case.

Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>


# e3ba7307 22-Dec-2016 Jan Kara <jack@suse.cz>

fsnotify: Remove fsnotify_duplicate_mark()

There are only two calls sites of fsnotify_duplicate_mark(). Those are
in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
for audit tree, inode pointer and group gets set in
fsnotify_add_mark_locked() later anyway, mask and free_mark are already
set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
actively harmful because following fsnotify_add_mark_locked() will leak
group reference by overwriting the group pointer. So just remove the two
calls to fsnotify_duplicate_mark() and the function.

Signed-off-by: Jan Kara <jack@suse.cz>
[PM: line wrapping to fit in 80 chars]
Signed-off-by: Paul Moore <paul@paul-moore.com>


# 35e48176 19-May-2016 Jan Kara <jack@suse.cz>

fsnotify: avoid spurious EMFILE errors from inotify_init()

Inotify instance is destroyed when all references to it are dropped.
That not only means that the corresponding file descriptor needs to be
closed but also that all corresponding instance marks are freed (as each
mark holds a reference to the inotify instance). However marks are
freed only after SRCU period ends which can take some time and thus if
user rapidly creates and frees inotify instances, number of existing
inotify instances can exceed max_user_instances limit although from user
point of view there is always at most one existing instance. Thus
inotify_init() returns EMFILE error which is hard to justify from user
point of view. This problem is exposed by LTP inotify06 testcase on
some machines.

We fix the problem by making sure all group marks are properly freed
while destroying inotify instance. We wait for SRCU period to end in
that path anyway since we have to make sure there is no event being
added to the instance while we are tearing down the instance. So it
takes only some plumbing to allow for marks to be destroyed in that path
as well and not from a dedicated work item.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Tested-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0918f1c3 17-Feb-2016 Jeff Layton <jlayton@kernel.org>

fsnotify: turn fsnotify reaper thread into a workqueue job

We don't require a dedicated thread for fsnotify cleanup. Switch it
over to a workqueue job instead that runs on the system_unbound_wq.

In the interest of not thrashing the queued job too often when there are
a lot of marks being removed, we delay the reaper job slightly when
queueing it, to allow several to gather on the list.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 13d34ac6 17-Feb-2016 Jeff Layton <jlayton@kernel.org>

Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"

This reverts commit c510eff6beba ("fsnotify: destroy marks with
call_srcu instead of dedicated thread").

Eryu reported that he was seeing some OOM kills kick in when running a
testcase that adds and removes inotify marks on a file in a tight loop.

The above commit changed the code to use call_srcu to clean up the
marks. While that does (in principle) work, the srcu callback job is
limited to cleaning up entries in small batches and only once per jiffy.
It's easily possible to overwhelm that machinery with too many call_srcu
callbacks, and Eryu's reproduer did just that.

There's also another potential problem with using call_srcu here. While
you can obviously sleep while holding the srcu_read_lock, the callbacks
run under local_bh_disable, so you can't sleep there.

It's possible when putting the last reference to the fsnotify_mark that
we'll end up putting a chain of references including the fsnotify_group,
uid, and associated keys. While I don't see any obvious ways that that
could occurs, it's probably still best to avoid using call_srcu here
after all.

This patch reverts the above patch. A later patch will take a different
approach to eliminated the dedicated thread here.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Reported-by: Eryu Guan <guaneryu@gmail.com>
Tested-by: Eryu Guan <guaneryu@gmail.com>
Cc: Jan Kara <jack@suse.com>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# c510eff6 14-Jan-2016 Jeff Layton <jlayton@kernel.org>

fsnotify: destroy marks with call_srcu instead of dedicated thread

At the time that this code was originally written, call_srcu didn't
exist, so this thread was required to ensure that we waited for that
SRCU grace period to settle before finally freeing the object.

It does exist now however and we can much more efficiently use call_srcu
to handle this. That also allows us to potentially use srcu_barrier to
ensure that they are all of the callbacks have run before proceeding.
In order to conserve space, we union the rcu_head with the g_list.

This will be necessary for nfsd which will allocate marks from a
dedicated slabcache. We have to be able to ensure that all of the
objects are destroyed before destroying the cache. That's fairly

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: Eric Paris <eparis@parisplace.org>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4712e722 04-Sep-2015 Jan Kara <jack@suse.com>

fsnotify: get rid of fsnotify_destroy_mark_locked()

fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group->mark_mutex. To avoid future problems with this
function, split it into two.

fsnotify_detach_mark() is the part that needs group->mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group->mark_mutex. This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group->mark_mutex.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 925d1132 04-Sep-2015 Jan Kara <jack@suse.com>

fsnotify: remove mark->free_list

Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away. However we can free all of
the marks without using a special list with some care.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8f2f3eb5 06-Aug-2015 Jan Kara <jack@suse.com>

fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
drops mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and thus the next
entry pointer we have cached may become stale and we dereference free
memory.

Fix the problem by first moving marks to free to a special private list
and then always free the first entry in the special list. This method
is safe even when entries from the list can disappear once we drop the
lock.

Signed-off-by: Jan Kara <jack@suse.com>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# d725e66c 21-Jul-2015 Linus Torvalds <torvalds@linux-foundation.org>

Revert "fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()"

This reverts commit a2673b6e040663bf16a552f8619e6bde9f4b9acf.

Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:

"Thanks for report! You are right that my patch introduces a race
between fsnotify kthread and fsnotify_destroy_group() which can result
in leaking inotify event on group destruction.

I haven't yet decided whether the right fix is not to queue events for
dying notification group (as that is pointless anyway) or whether we
should just fix the original problem differently... Whenever I look
at fsnotify code mark handling I get lost in the maze of locks, lists,
and subtle differences between how different notification systems
handle notification marks :( I'll think about it over night"

and after thinking about it, Jan says:

"OK, I have looked into the code some more and I found another
relatively simple way of fixing the original oops. It will be IMHO
better than trying to fixup this issue which has more potential for
breakage. I'll ask Linus to revert the fsnotify fix he already merged
and send a new fix"

Reported-by: Kinglong Mee <kinglongmee@gmail.com>
Requested-by: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a2673b6e 17-Jul-2015 Jan Kara <jack@suse.cz>

fsnotify: fix oops in fsnotify_clear_marks_by_group_flags()

fsnotify_clear_marks_by_group_flags() can race with
fsnotify_destroy_marks() so when fsnotify_destroy_mark_locked() drops
mark_mutex, a mark from the list iterated by
fsnotify_clear_marks_by_group_flags() can be freed and we dereference free
memory in the loop there.

Fix the problem by keeping mark_mutex held in
fsnotify_destroy_mark_locked(). The reason why we drop that mutex is that
we need to call a ->freeing_mark() callback which may acquire mark_mutex
again. To avoid this and similar lock inversion issues, we move the call
to ->freeing_mark() callback to the kthread destroying the mark.

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Ashish Sangwan <a.sangwan@samsung.com>
Suggested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 37d469e7 12-Dec-2014 Jan Kara <jack@suse.cz>

fsnotify: remove destroy_list from fsnotify_mark

destroy_list is used to track marks which still need waiting for srcu
period end before they can be freed. However by the time mark is added to
destroy_list it isn't in group's list of marks anymore and thus we can
reuse fsnotify_mark->g_list for queueing into destroy_list. This saves
two pointers for each fsnotify_mark.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0809ab69 12-Dec-2014 Jan Kara <jack@suse.cz>

fsnotify: unify inode and mount marks handling

There's a lot of common code in inode and mount marks handling. Factor it
out to a common helper function.

Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 8edc6e16 13-Nov-2014 Jan Kara <jack@suse.cz>

fanotify: fix notification of groups with inode & mount marks

fsnotify() needs to merge inode and mount marks lists when notifying
groups about events so that ignore masks from inode marks are reflected
in mount mark notifications and groups are notified in proper order
(according to priorities).

Currently the sorting of the lists done by fsnotify_add_inode_mark() /
fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
ignore masks not being used in some cases.

Fix the problem by always using the same comparison function when
sorting / merging the mark lists.

Thanks to Heinrich Schuchardt for improvements of my patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# efa8f7e5 04-Jun-2014 David Cohen <david.a.cohen@linux.intel.com>

fs/notify/mark.c: trivial cleanup

Do not initialize private_destroy_list twice. list_replace_init()
already takes care of initializing private_destroy_list. We don't need
to initialize it with LIST_HEAD() beforehand.

Signed-off-by: David Cohen <david.a.cohen@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9756b918 08-Jul-2013 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: update comments concerning locking scheme

There have been changes in the locking scheme of fsnotify but the
comments in the source code have not been updated yet. This patch
corrects this.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 6960b0d9 11-Aug-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: change locking order

On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote:
>
> I finally built and tested a v3.0 kernel with these patches (I know I'm
> SOOOOOO far behind). Not what I hoped for:
>
> > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day...
> > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
> > [ 150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0
> > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [ 150.946012] CPU 0
> > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan]
> > [ 150.946012]
> > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM
> > [ 150.946012] RIP: 0010:[<ffffffff810ffd58>] [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282
> > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438
> > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240
> > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000
> > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428
> > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610
> > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
> > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0
> > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760)
> > [ 150.946012] Stack:
> > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76
> > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250
> > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438
> > [ 150.946012] Call Trace:
> > [ 150.946012] [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130
> > [ 150.946012] [<ffffffff8115e9be>] evict+0x7e/0x170
> > [ 150.946012] [<ffffffff8115ee40>] iput_final+0xd0/0x190
> > [ 150.946012] [<ffffffff8115ef33>] iput+0x33/0x40
> > [ 150.946012] [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160
> > [ 150.946012] [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50
> > [ 150.946012] [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0
> > [ 150.946012] [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b
> > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00
> > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a
> > [ 150.946012] RIP [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50
> > [ 150.946012] RSP <ffff88002c2e5df8>
> > [ 150.946012] CR2: 0000000000000070
>
> Looks at aweful lot like the problem from:
> http://www.spinics.net/lists/linux-fsdevel/msg46101.html
>

I tried to reproduce this bug with your test program, but without success.
However, if I understand correctly, this occurs since we dont hold any locks when
we call iput() in mark_destroy(), right?
With the patches you tested, iput() is also not called within any lock, since the
groups mark_mutex is released temporarily before iput() is called. This is, since
the original codes behaviour is similar.
However since we now have a mutex as the biggest lock, we can do what you
suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and
call iput() with the mutex held to avoid the race.
The patch below implements this. It uses nested locking to avoid deadlock in case
we do the final iput() on an inode which still holds marks and thus would take
the mutex again when calling fsnotify_inode_delete() in destroy_inode().

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# 64c20d2a 14-Jun-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: dont put marks on temporary list when clearing marks by group

In clear_marks_by_group_flags() the mark list of a group is iterated and the
marks are put on a temporary list.
Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list
any more and are able to remove the marks while the mark list is iterated and
the mark list mutex is held.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# d5a335b8 14-Jun-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark()

This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked()
which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but
assume that the caller has already taken the groups mark mutex.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# e2a29943 14-Jun-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: pass group to fsnotify_destroy_mark()

In fsnotify_destroy_mark() dont get the group from the passed mark anymore,
but pass the group itself as an additional parameter to the function.

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# 986ab098 14-Jun-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: use a mutex instead of a spinlock to protect a groups mark list

Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead
of a spinlock results in more flexibility (i.e it allows to sleep while the
lock is held).

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# 104d06f0 14-Jun-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: take groups mark_lock before mark lock

Race-free addition and removal of a mark to a groups mark list would be easier
if we could lock the mark list of group before we lock the specific mark.
This patch changes the order used to add/remove marks to/from mark lists from

1. mark->lock
2. group->mark_lock
3. inode->i_lock

to

1. group->mark_lock
2. mark->lock
3. inode->i_lock

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# 23e964c2 14-Jun-2011 Lino Sanfilippo <LinoSanfilippo@gmx.de>

fsnotify: use reference counting for groups

Get a group ref for each mark that is added to the groups list and release that
ref when the mark is freed in fsnotify_put_mark().
We also use get a group reference for duplicated marks and for private event
data.
Now we dont free a group any more when the number of marks becomes 0 but when
the groups ref count does. Since this will only happen when all marks are removed
from a groups mark list, we dont have to set the groups number of marks to 1 at
group creation.

Beside clearing all marks in fsnotify_destroy_group() we do also flush the
groups event queue. This is since events may hold references to groups (due to
private event data) and we have to put those references first before we get a
chance to put the final ref, which will result in a call to
fsnotify_final_destroy_group().

Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# fed47485 12-Jan-2012 Miklos Szeredi <mszeredi@suse.cz>

fsnotify: don't BUG in fsnotify_destroy_mark()

Removing the parent of a watched file results in "kernel BUG at
fs/notify/mark.c:139".

To reproduce

add "-w /tmp/audit/dir/watched_file" to audit.rules
rm -rf /tmp/audit/dir

This is caused by fsnotify_destroy_mark() being called without an
extra reference taken by the caller.

Reported by Francesco Cosoleto here:

https://bugzilla.novell.com/show_bug.cgi?id=689860

Fix by removing the BUG_ON and adding a comment about not accessing mark after
the iput.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 60063497 26-Jul-2011 Arun Sharma <asharma@fb.com>

atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 67a23c49 22-Mar-2011 Dave Chinner <dchinner@redhat.com>

fs: rename inode_lock to inode_hash_lock

All that remains of the inode_lock is protecting the inode hash list
manipulation and traversals. Rename the inode_lock to
inode_hash_lock to reflect it's actual function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 02436668 28-Jul-2010 Eric Paris <eparis@redhat.com>

fsnotify: remove global fsnotify groups lists

The global fsnotify groups lists were invented as a way to increase the
performance of fsnotify by shortcutting events which were not interesting.
With the changes to walk the object lists rather than global groups lists
these shortcuts are not useful.

Signed-off-by: Eric Paris <eparis@redhat.com>


# 8778abb9 28-Jul-2010 Andreas Gruenbacher <agruen@suse.de>

fsnotify: Exchange list heads instead of moving elements

Instead of moving list elements from destroy_list to &private_destroy_list,
exchange the list heads.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>


# 75c1be48 28-Jul-2010 Eric Paris <eparis@redhat.com>

fsnotify: srcu to protect read side of inode and vfsmount locks

Currently reading the inode->i_fsnotify_marks or
vfsmount->mnt_fsnotify_marks lists are protected by a spinlock on both the
read and the write side. This patch protects the read side of those lists
with a new single srcu.

Signed-off-by: Eric Paris <eparis@redhat.com>


# 700307a2 28-Jul-2010 Eric Paris <eparis@redhat.com>

fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called

Currently fsnotify check is mark->group is NULL to decide if
fsnotify_destroy_mark() has already been called or not. With the upcoming
rcu work it is a heck of a lot easier to use an explicit flag than worry
about group being set to NULL.

Signed-off-by: Eric Paris <eparis@redhat.com>


# b31d397e 21-Apr-2010 Eric Paris <eparis@redhat.com>

fsnotify: call iput on inodes when no longer marked

fsnotify takes an igrab on an inode when it adds a mark. The code was
supposed to drop the reference when the mark was removed but didn't.
This caused problems when an fs was unmounted because those inodes would
clearly not be gone. Thus resulting in the most devistating of messages:

VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
>>> Have a nice day...

Jiri Slaby bisected the problem to a patch in the fsnotify tree. The
code snippets below show my stupidity quite clearly.

void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
{
...
mark->inode = NULL;
...
}

void fsnotify_destroy_mark(struct fsnotify_mark *mark)
{
struct inode *inode = NULL;
...
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
fsnotify_destroy_inode_mark(mark);
inode = mark->i.inode;
}
...
if (inode)
iput(inode);
...
}

Obviously the intent was to capture the inode before it was set to NULL in
fsnotify_destory_inode_mark() so we wouldn't be leaking inodes forever.
Instead we leaked them (and exploded on umount)

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Eric Paris <eparis@redhat.com>


# 4d92604c 17-Dec-2009 Eric Paris <eparis@redhat.com>

fanotify: clear all fanotify marks

fanotify listeners may want to clear all marks. They may want to do this
to destroy all of their inode marks which have nothing but ignores.
Realistically this is useful for av vendors who update policy and want to
clear all of their cached allows.

Signed-off-by: Eric Paris <eparis@redhat.com>


# 33af5e32 17-Dec-2009 Eric Paris <eparis@redhat.com>

fsnotify: ignored_mask - excluding notification

The ignored_mask is a new mask which is part of fsnotify marks. A group's
should_send_event() function can use the ignored mask to determine that
certain events are not of interest. In particular if a group registers a
mask including FS_OPEN on a vfsmount they could add FS_OPEN to the
ignored_mask for individual inodes and not send open events for those
inodes.

Signed-off-by: Eric Paris <eparis@redhat.com>


# 90b1e7a5 17-Dec-2009 Eric Paris <eparis@redhat.com>

fsnotify: allow marks to not pin inodes in core

inotify marks must pin inodes in core. dnotify doesn't technically need to
since they are closed when the directory is closed. fanotify also need to
pin inodes in core as it works today. But the next step is to introduce
the concept of 'ignored masks' which is actually a mask of events for an
inode of no interest. I claim that these should be liberally sent to the
kernel and should not pin the inode in core. If the inode is brought back
in the listener will get an event it may have thought excluded, but this is
not a serious situation and one any listener should deal with.

This patch lays the ground work for non-pinning inode marks by using lazy
inode pinning. We do not pin a mark until it has a non-zero mask entry. If a
listener new sets a mask we never pin the inode.

Signed-off-by: Eric Paris <eparis@redhat.com>


# 0d48b7f0 17-Dec-2009 Eric Paris <eparis@redhat.com>

fsnotify: vfsmount marks generic functions

Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.

Signed-off-by: Eric Paris <eparis@redhat.com>


# ba643f04 17-Dec-2009 Eric Paris <eparis@redhat.com>

fsnotify: clear marks to 0 in fsnotify_init_mark

Currently fsnotify_init_mark sets some fields to 0/NULL. Some users
already used some sorts of zalloc, some didn't. This patch uses memset to
explicitly zero everything in the fsnotify_mark when it is initialized so we
don't have to be careful if fields are later added to marks.

Signed-off-by: Eric Paris <eparis@redhat.com>


# 5444e298 17-Dec-2009 Eric Paris <eparis@redhat.com>

fsnotify: split generic and inode specific mark code

currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c

Signed-off-by: Eric Paris <eparis@redhat.com>