Searched +hist:6 +hist:b19b766 (Results 1 - 16 of 16) sorted by relevance

/linux-master/fs/cachefiles/
H A Dio.cdiff 6ae65439 Wed Nov 22 05:27:12 MST 2023 Amir Goldstein <amir73il@gmail.com> fs: move kiocb_start_write() into vfs_iocb_iter_write()

In vfs code, sb_start_write() is usually called after the permission hook
in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule,
where kiocb_start_write() is called by its callers.

Move kiocb_start_write() from the callers into vfs_iocb_iter_write()
after the rw_verify_area() checks, to make them "start-write-safe".

The semantics of vfs_iocb_iter_write() is changed, so that the caller is
responsible for calling kiocb_end_write() on completion only if async
iocb was queued. The completion handlers of both callers were adapted
to this semantic change.

This is needed for fanotify "pre content" events.

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231122122715.2561213-14-amir73il@gmail.com
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 9032b6e8 Mon Apr 25 06:21:27 MDT 2022 Jeffle Xu <jefflexu@linux.alibaba.com> cachefiles: implement on-demand read

Implement the data plane of on-demand read mode.

The early implementation [1] place the entry to
cachefiles_ondemand_read() in fscache_read(). However, fscache_read()
can only detect if the requested file range is fully cache miss, whilst
we need to notify the user daemon as long as there's a hole inside the
requested file range.

Thus the entry is now placed in cachefiles_prepare_read(). When working
in on-demand read mode, once a hole detected, the read routine will send
a READ request to the user daemon. The user daemon needs to fetch the
data and write it to the cache file. After sending the READ request, the
read routine will hang there, until the READ request is handled by the
user daemon. Then it will retry to read from the same file range. If no
progress encountered, the read routine will fail then.

A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
read should be done when a cache miss encountered.

[1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-6-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
diff 9032b6e8 Mon Apr 25 06:21:27 MDT 2022 Jeffle Xu <jefflexu@linux.alibaba.com> cachefiles: implement on-demand read

Implement the data plane of on-demand read mode.

The early implementation [1] place the entry to
cachefiles_ondemand_read() in fscache_read(). However, fscache_read()
can only detect if the requested file range is fully cache miss, whilst
we need to notify the user daemon as long as there's a hole inside the
requested file range.

Thus the entry is now placed in cachefiles_prepare_read(). When working
in on-demand read mode, once a hole detected, the read routine will send
a READ request to the user daemon. The user daemon needs to fetch the
data and write it to the cache file. After sending the READ request, the
read routine will hang there, until the READ request is handled by the
user daemon. Then it will retry to read from the same file range. If no
progress encountered, the read routine will fail then.

A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand
read should be done when a cache miss encountered.

[1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8

Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20220425122143.56815-6-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
diff 6a19114b Thu Feb 17 03:01:23 MST 2022 David Howells <dhowells@redhat.com> netfs: Rename netfs_read_*request to netfs_io_*request

Rename netfs_read_*request to netfs_io_*request so that the same structures
can be used for the write helpers too.

perl -p -i -e 's/netfs_read_(request|subrequest)/netfs_io_$1/g' \
`git grep -l 'netfs_read_\(sub\|\)request'`
perl -p -i -e 's/nr_rd_ops/nr_outstanding/g' \
`git grep -l nr_rd_ops`
perl -p -i -e 's/nr_wr_ops/nr_copy_ops/g' \
`git grep -l nr_wr_ops`
perl -p -i -e 's/netfs_read_source/netfs_io_source/g' \
`git grep -l 'netfs_read_source'`
perl -p -i -e 's/netfs_io_request_ops/netfs_request_ops/g' \
`git grep -l 'netfs_io_request_ops'`
perl -p -i -e 's/init_rreq/init_request/g' \
`git grep -l 'init_rreq'`

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

Link: https://lore.kernel.org/r/164622988070.3564931.7089670190434315183.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/164678195157.1200972.366609966927368090.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/164692891535.2099075.18435198075367420588.stgit@warthog.procyon.org.uk/ # v3
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/linux-master/drivers/nvme/target/
H A Dio-cmd-file.cdiff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/linux-master/block/
H A Dfops.cdiff 487c607d Tue Aug 01 11:22:00 MDT 2023 Christoph Hellwig <hch@lst.de> block: use iomap for writes to block devices

Use iomap in buffer_head compat mode to write to block devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230801172201.1923299-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 37fee2e4 Fri Jun 10 01:58:24 MDT 2022 Keith Busch <kbusch@kernel.org> block: add a helper function for dio alignment

This will make it easier to add more complex acceptable alignment
criteria in the future.

Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220610195830.3574005-6-kbusch@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/linux-master/fs/overlayfs/
H A Dfile.cdiff 6ae65439 Wed Nov 22 05:27:12 MST 2023 Amir Goldstein <amir73il@gmail.com> fs: move kiocb_start_write() into vfs_iocb_iter_write()

In vfs code, sb_start_write() is usually called after the permission hook
in rw_verify_area(). vfs_iocb_iter_write() is an exception to this rule,
where kiocb_start_write() is called by its callers.

Move kiocb_start_write() from the callers into vfs_iocb_iter_write()
after the rw_verify_area() checks, to make them "start-write-safe".

The semantics of vfs_iocb_iter_write() is changed, so that the caller is
responsible for calling kiocb_end_write() on completion only if async
iocb was queued. The completion handlers of both callers were adapted
to this semantic change.

This is needed for fanotify "pre content" events.

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20231122122715.2561213-14-amir73il@gmail.com
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 724768a3 Tue Aug 22 11:50:59 MDT 2023 Amir Goldstein <amir73il@gmail.com> ovl: fix incorrect fdput() on aio completion

ovl_{read,write}_iter() always call fdput(real) to put one or zero
refcounts of the real file, but for aio, whether it was submitted or not,
ovl_aio_put() also calls fdput(), which is not balanced. This is only a
problem in the less common case when FDPUT_FPUT flag is set.

To fix the problem use get_file() to take file refcount and use fput()
instead of fdput() in ovl_aio_put().

Fixes: 2406a307ac7d ("ovl: implement async IO routines")
Cc: <stable@vger.kernel.org> # v5.6
Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
diff bc2473c9 Thu Jun 15 05:22:29 MDT 2023 Amir Goldstein <amir73il@gmail.com> ovl: enable fsnotify events on underlying real files

Overlayfs creates the real underlying files with fake f_path, whose
f_inode is on the underlying fs and f_path on overlayfs.

Those real files were open with FMODE_NONOTIFY, because fsnotify code was
not prapared to handle fsnotify hooks on files with fake path correctly
and fanotify would report unexpected event->fd with fake overlayfs path,
when the underlying fs was being watched.

Teach fsnotify to handle events on the real files, and do not set real
files to FMODE_NONOTIFY to allow operations on real file (e.g. open,
access, modify, close) to generate async and permission events.

Because fsnotify does not have notifications on address space
operations, we do not need to worry about ->vm_file not reporting
events to a watched overlayfs when users are accessing a mapped
overlayfs file.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Message-Id: <20230615112229.2143178-6-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 9a254403 Wed Sep 29 21:22:28 MDT 2021 yangerkun <yangerkun@huawei.com> ovl: fix use after free in struct ovl_aio_req

Example for triggering use after free in a overlay on ext4 setup:

aio_read
ovl_read_iter
vfs_iter_read
ext4_file_read_iter
ext4_dio_read_iter
iomap_dio_rw -> -EIOCBQUEUED
/*
* Here IO is completed in a separate thread,
* ovl_aio_cleanup_handler() frees aio_req which has iocb embedded
*/
file_accessed(iocb->ki_filp); /**BOOM**/

Fix by introducing a refcount in ovl_aio_req similarly to aio_kiocb. This
guarantees that iocb is only freed after vfs_read/write_iter() returns on
underlying fs.

Fixes: 2406a307ac7d ("ovl: implement async IO routines")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20210930032228.3199690-3-yangerkun@huawei.com/
Cc: <stable@vger.kernel.org> # v5.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 47291baa Thu Jan 21 06:19:24 MST 2021 Christian Brauner <christian.brauner@ubuntu.com> namei: make permission helpers idmapped mount aware

The two helpers inode_permission() and generic_permission() are used by
the vfs to perform basic permission checking by verifying that the
caller is privileged over an inode. In order to handle idmapped mounts
we extend the two helpers with an additional user namespace argument.
On idmapped mounts the two helpers will make sure to map the inode
according to the mount's user namespace and then peform identical
permission checks to inode_permission() and generic_permission(). If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
/linux-master/drivers/usb/gadget/legacy/
H A Dinode.cdiff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff f5678e7f Wed Jun 10 19:42:06 MDT 2020 Christoph Hellwig <hch@lst.de> kernel: better document the use_mm/unuse_mm API contract

Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb]
Acked-by: Haren Myneni <haren@linux.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff 6e76c01e Thu Sep 21 14:12:01 MDT 2017 Alan Stern <stern@rowland.harvard.edu> USB: gadgetfs: fix copy_to_user while holding spinlock

The gadgetfs driver as a long-outstanding FIXME, regarding a call of
copy_to_user() made while holding a spinlock. This patch fixes the
issue by dropping the spinlock and using the dev->udc_usage mechanism
introduced by another recent patch to guard against status changes
while the lock isn't held.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
CC: <stable@vger.kernel.org>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff f50b878f Thu Jun 08 11:55:59 MDT 2017 Alan Stern <stern@rowland.harvard.edu> USB: gadget: fix GPF in gadgetfs

A NULL-pointer dereference bug in gadgetfs was uncovered by syzkaller:

> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
> CPU: 2 PID: 4820 Comm: syz-executor0 Not tainted 4.12.0-rc4+ #5
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> task: ffff880039542dc0 task.stack: ffff88003bdd0000
> RIP: 0010:__list_del_entry_valid+0x7e/0x170 lib/list_debug.c:51
> RSP: 0018:ffff88003bdd6e50 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000010000
> RDX: 0000000000000000 RSI: ffffffff86504948 RDI: ffffffff86504950
> RBP: ffff88003bdd6e68 R08: ffff880039542dc0 R09: ffffffff8778ce00
> R10: ffff88003bdd6e68 R11: dffffc0000000000 R12: 0000000000000000
> R13: dffffc0000000000 R14: 1ffff100077badd2 R15: ffffffff864d2e40
> FS: 0000000000000000(0000) GS:ffff88006dc00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000002014aff9 CR3: 0000000006022000 CR4: 00000000000006e0
> Call Trace:
> __list_del_entry include/linux/list.h:116 [inline]
> list_del include/linux/list.h:124 [inline]
> usb_gadget_unregister_driver+0x166/0x4c0 drivers/usb/gadget/udc/core.c:1387
> dev_release+0x80/0x160 drivers/usb/gadget/legacy/inode.c:1187
> __fput+0x332/0x7f0 fs/file_table.c:209
> ____fput+0x15/0x20 fs/file_table.c:245
> task_work_run+0x19b/0x270 kernel/task_work.c:116
> exit_task_work include/linux/task_work.h:21 [inline]
> do_exit+0x18a3/0x2820 kernel/exit.c:878
> do_group_exit+0x149/0x420 kernel/exit.c:982
> get_signal+0x77f/0x1780 kernel/signal.c:2318
> do_signal+0xd2/0x2130 arch/x86/kernel/signal.c:808
> exit_to_usermode_loop+0x1a7/0x240 arch/x86/entry/common.c:157
> prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
> syscall_return_slowpath+0x3ba/0x410 arch/x86/entry/common.c:263
> entry_SYSCALL_64_fastpath+0xbc/0xbe
> RIP: 0033:0x4461f9
> RSP: 002b:00007fdac2b1ecf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 00000000007080c8 RCX: 00000000004461f9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000007080c8
> RBP: 00000000007080a8 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007fdac2b1f9c0 R15: 00007fdac2b1f700
> Code: 00 00 00 00 ad de 49 39 c4 74 6a 48 b8 00 02 00 00 00 00 ad de
> 48 89 da 48 39 c3 74 74 48 c1 ea 03 48 b8 00 00 00 00 00 fc ff df <80>
> 3c 02 00 0f 85 92 00 00 00 48 8b 13 48 39 f2 75 66 49 8d 7c
> RIP: __list_del_entry_valid+0x7e/0x170 lib/list_debug.c:51 RSP: ffff88003bdd6e50
> ---[ end trace 30e94b1eec4831c8 ]---
> Kernel panic - not syncing: Fatal exception

The bug was caused by dev_release() failing to turn off its
gadget_registered flag after unregistering the gadget driver. As a
result, when a later user closed the device file before writing a
valid set of descriptors, dev_release() thought the gadget had been
registered and tried to unregister it, even though it had not been.
This led to the NULL pointer dereference.

The fix is simple: turn off the flag when the gadget is unregistered.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
/linux-master/fs/iomap/
H A Ddirect-io.cdiff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6fe7b990 Thu Sep 19 16:32:44 MDT 2019 Matthew Bobrowski <mbobrowski@mbobrowski.org> iomap: split size and error for iomap_dio_rw ->end_io

Modify the calling convention for the iomap_dio_rw ->end_io() callback.
Rather than passing either dio->error or dio->size as the 'size' argument,
instead pass both the dio->error and the dio->size value separately.

In the instance that an error occurred during a write, we currently cannot
determine whether any blocks have been allocated beyond the current EOF and
data has subsequently been written to these blocks within the ->end_io()
callback. As a result, we cannot judge whether we should take the truncate
failed write path. Having both dio->error and dio->size will allow us to
perform such checks within this callback.

Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
[hch: minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
/linux-master/crypto/
H A Daf_alg.cdiff 6b4b53ca Tue Aug 01 07:57:09 MDT 2023 Frederick Lawler <fred@cloudflare.com> crypto: af_alg - Decrement struct key.usage in alg_set_by_key_serial()

Calls to lookup_user_key() require a corresponding key_put() to
decrement the usage counter. Once it reaches zero, we schedule key GC.
Therefore decrement struct key.usage in alg_set_by_key_serial().

Fixes: 7984ceb134bf ("crypto: af_alg - Support symmetric encryption via keyring keys")
Cc: <stable@vger.kernel.org>
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff 6a4b8aa0 Wed Jul 26 15:53:19 MDT 2023 David Howells <dhowells@redhat.com> crypto: af_alg - Fix missing initialisation affecting gcm-aes-s390

Fix af_alg_alloc_areq() to initialise areq->first_rsgl.sgl.sgt.sgl to point
to the scatterlist array in areq->first_rsgl.sgl.sgl.

Without this, the gcm-aes-s390 driver will oops when it tries to do
gcm_walk_start() on req->dst because req->dst is set to the value of
areq->first_rsgl.sgl.sgl by _aead_recvmsg() calling
aead_request_set_crypt().

The problem comes if an empty ciphertext is passed: the loop in
af_alg_get_rsgl() just passes straight out and doesn't set areq->first_rsgl
up.

This isn't a problem on x86_64 using gcmaes_crypt_by_sg() because, as far
as I can tell, that ignores req->dst and only uses req->src[*].

[*] Is this a bug in aesni-intel_glue.c?

The s390x oops looks something like:

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 0000000a00000000 TEID: 0000000a00000803
Fault in home space mode while using kernel ASCE.
AS:00000000a43a0007 R3:0000000000000024
Oops: 003b ilc:2 [#1] SMP
...
Call Trace:
[<000003ff7fc3d47e>] gcm_walk_start+0x16/0x28 [aes_s390]
[<00000000a2a342f2>] crypto_aead_decrypt+0x9a/0xb8
[<00000000a2a60888>] aead_recvmsg+0x478/0x698
[<00000000a2e519a0>] sock_recvmsg+0x70/0xb0
[<00000000a2e51a56>] sock_read_iter+0x76/0xa0
[<00000000a273e066>] vfs_read+0x26e/0x2a8
[<00000000a273e8c4>] ksys_read+0xbc/0x100
[<00000000a311d808>] __do_syscall+0x1d0/0x1f8
[<00000000a312ff30>] system_call+0x70/0x98
Last Breaking-Event-Address:
[<000003ff7fc3e6b4>] gcm_aes_crypt+0x104/0xa68 [aes_s390]

Fixes: c1abe6f570af ("crypto: af_alg: Use extract_iter_to_sg() to create scatterlists")
Reported-by: Ondrej Mosnáček <omosnacek@gmail.com>
Link: https://lore.kernel.org/r/CAAUqJDuRkHE8fPgZJGaKjUjd3QfGwzfumuJBmStPqBhubxyk_A@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: Sven Schnelle <svens@linux.ibm.com>
cc: Harald Freudenberger <freude@linux.vnet.ibm.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-crypto@vger.kernel.org
cc: linux-s390@vger.kernel.org
cc: regressions@lists.linux.dev
Tested-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Ondrej Mosnáček <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff 43804992 Thu Jun 15 15:08:54 MDT 2023 David Howells <dhowells@redhat.com> crypto: Fix af_alg_sendmsg(MSG_SPLICE_PAGES) sglist limit

When af_alg_sendmsg() calls extract_iter_to_sg(), it passes MAX_SGL_ENTS as
the maximum number of elements that may be written to, but some of the
elements may already have been used (as recorded in sgl->cur), so
extract_iter_to_sg() may end up overrunning the scatterlist.

Fix this to limit the number of elements to "MAX_SGL_ENTS - sgl->cur".

Note: It probably makes sense in future to alter the behaviour of
extract_iter_to_sg() to stop if "sgtable->nents >= sg_max" instead, but
this is a smaller fix for now.

The bug causes errors looking something like:

BUG: KASAN: slab-out-of-bounds in sg_assign_page include/linux/scatterlist.h:109 [inline]
BUG: KASAN: slab-out-of-bounds in sg_set_page include/linux/scatterlist.h:139 [inline]
BUG: KASAN: slab-out-of-bounds in extract_bvec_to_sg lib/scatterlist.c:1183 [inline]
BUG: KASAN: slab-out-of-bounds in extract_iter_to_sg lib/scatterlist.c:1352 [inline]
BUG: KASAN: slab-out-of-bounds in extract_iter_to_sg+0x17a6/0x1960 lib/scatterlist.c:1339

Fixes: bf63e250c4b1 ("crypto: af_alg: Support MSG_SPLICE_PAGES")
Reported-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000b2585a05fdeb8379@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 43804992 Thu Jun 15 15:08:54 MDT 2023 David Howells <dhowells@redhat.com> crypto: Fix af_alg_sendmsg(MSG_SPLICE_PAGES) sglist limit

When af_alg_sendmsg() calls extract_iter_to_sg(), it passes MAX_SGL_ENTS as
the maximum number of elements that may be written to, but some of the
elements may already have been used (as recorded in sgl->cur), so
extract_iter_to_sg() may end up overrunning the scatterlist.

Fix this to limit the number of elements to "MAX_SGL_ENTS - sgl->cur".

Note: It probably makes sense in future to alter the behaviour of
extract_iter_to_sg() to stop if "sgtable->nents >= sg_max" instead, but
this is a smaller fix for now.

The bug causes errors looking something like:

BUG: KASAN: slab-out-of-bounds in sg_assign_page include/linux/scatterlist.h:109 [inline]
BUG: KASAN: slab-out-of-bounds in sg_set_page include/linux/scatterlist.h:139 [inline]
BUG: KASAN: slab-out-of-bounds in extract_bvec_to_sg lib/scatterlist.c:1183 [inline]
BUG: KASAN: slab-out-of-bounds in extract_iter_to_sg lib/scatterlist.c:1352 [inline]
BUG: KASAN: slab-out-of-bounds in extract_iter_to_sg+0x17a6/0x1960 lib/scatterlist.c:1339

Fixes: bf63e250c4b1 ("crypto: af_alg: Support MSG_SPLICE_PAGES")
Reported-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000b2585a05fdeb8379@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+6efc50cc1f8d718d6cb7@syzkaller.appspotmail.com
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-crypto@vger.kernel.org
cc: netdev@vger.kernel.org
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 25206111 Sun Feb 13 12:06:07 MST 2022 Eric Dumazet <edumazet@google.com> crypto: af_alg - get rid of alg_memory_allocated

alg_memory_allocated does not seem to be really used.

alg_proto does have a .memory_allocated field, but no
corresponding .sysctl_mem.

This means sk_has_account() returns true, but all sk_prot_mem_limits()
users will trigger a NULL dereference [1].

THis was not a problem until SO_RESERVE_MEM addition.

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 3591 Comm: syz-executor153 Not tainted 5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sock_setsockopt+0x14a9/0x3a30 net/core/sock.c:1446
__sys_setsockopt+0x5af/0x980 net/socket.c:2176
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xb1/0xc0 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc7440fddc9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe98f07968 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc7440fddc9
RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000004 R09: 00007ffe98f07990
R10: 0000000020000000 R11: 0000000000000246 R12: 00007ffe98f0798c
R13: 00007ffe98f079a0 R14: 00007ffe98f079e0 R15: 0000000000000000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 25206111 Sun Feb 13 12:06:07 MST 2022 Eric Dumazet <edumazet@google.com> crypto: af_alg - get rid of alg_memory_allocated

alg_memory_allocated does not seem to be really used.

alg_proto does have a .memory_allocated field, but no
corresponding .sysctl_mem.

This means sk_has_account() returns true, but all sk_prot_mem_limits()
users will trigger a NULL dereference [1].

THis was not a problem until SO_RESERVE_MEM addition.

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 3591 Comm: syz-executor153 Not tainted 5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sock_setsockopt+0x14a9/0x3a30 net/core/sock.c:1446
__sys_setsockopt+0x5af/0x980 net/socket.c:2176
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xb1/0xc0 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc7440fddc9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe98f07968 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc7440fddc9
RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000004 R09: 00007ffe98f07990
R10: 0000000020000000 R11: 0000000000000246 R12: 00007ffe98f0798c
R13: 00007ffe98f079a0 R14: 00007ffe98f079e0 R15: 0000000000000000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 25206111 Sun Feb 13 12:06:07 MST 2022 Eric Dumazet <edumazet@google.com> crypto: af_alg - get rid of alg_memory_allocated

alg_memory_allocated does not seem to be really used.

alg_proto does have a .memory_allocated field, but no
corresponding .sysctl_mem.

This means sk_has_account() returns true, but all sk_prot_mem_limits()
users will trigger a NULL dereference [1].

THis was not a problem until SO_RESERVE_MEM addition.

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 3591 Comm: syz-executor153 Not tainted 5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sock_setsockopt+0x14a9/0x3a30 net/core/sock.c:1446
__sys_setsockopt+0x5af/0x980 net/socket.c:2176
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xb1/0xc0 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc7440fddc9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe98f07968 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc7440fddc9
RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000004 R09: 00007ffe98f07990
R10: 0000000020000000 R11: 0000000000000246 R12: 00007ffe98f0798c
R13: 00007ffe98f079a0 R14: 00007ffe98f079e0 R15: 0000000000000000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 25206111 Sun Feb 13 12:06:07 MST 2022 Eric Dumazet <edumazet@google.com> crypto: af_alg - get rid of alg_memory_allocated

alg_memory_allocated does not seem to be really used.

alg_proto does have a .memory_allocated field, but no
corresponding .sysctl_mem.

This means sk_has_account() returns true, but all sk_prot_mem_limits()
users will trigger a NULL dereference [1].

THis was not a problem until SO_RESERVE_MEM addition.

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 3591 Comm: syz-executor153 Not tainted 5.17.0-rc3-syzkaller-00316-gb81b1829e7e3 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
sock_setsockopt+0x14a9/0x3a30 net/core/sock.c:1446
__sys_setsockopt+0x5af/0x980 net/socket.c:2176
__do_sys_setsockopt net/socket.c:2191 [inline]
__se_sys_setsockopt net/socket.c:2188 [inline]
__x64_sys_setsockopt+0xb1/0xc0 net/socket.c:2188
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fc7440fddc9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe98f07968 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc7440fddc9
RDX: 0000000000000049 RSI: 0000000000000001 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000004 R09: 00007ffe98f07990
R10: 0000000020000000 R11: 0000000000000246 R12: 00007ffe98f0798c
R13: 00007ffe98f079a0 R14: 00007ffe98f079e0 R15: 0000000000000000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:sk_prot_mem_limits include/net/sock.h:1523 [inline]
RIP: 0010:sock_reserve_memory+0x1d7/0x330 net/core/sock.c:1000
Code: 08 00 74 08 48 89 ef e8 27 20 bb f9 4c 03 7c 24 10 48 8b 6d 00 48 83 c5 08 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 fb 1f bb f9 48 8b 6d 00 4c 89 ff 48
RSP: 0018:ffffc90001f1fb68 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff88814aabc000 RCX: dffffc0000000000
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff90e18120
RBP: 0000000000000008 R08: dffffc0000000000 R09: fffffbfff21c3025
R10: fffffbfff21c3025 R11: 0000000000000000 R12: ffffffff8d109840
R13: 0000000000001002 R14: 0000000000000001 R15: 0000000000000001
FS: 0000555556e08300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc74416f130 CR3: 0000000073d9e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

Fixes: 2bb2f5fb21b0 ("net: add new socket option SO_RESERVE_MEM")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/linux-master/drivers/target/
H A Dtarget_core_file.cdiff 3c7ebe95 Fri Feb 03 08:06:16 MST 2023 Christoph Hellwig <hch@lst.de> target: use bvec_set_page to initialize bvecs

Use the bvec_set_page helper to initialize bvecs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230203150634.3199647-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b1dba3d Sat Jan 09 08:53:27 MST 2021 Pavel Begunkov <asml.silence@gmail.com> scsi: target: file: Don't zero iter before iov_iter_bvec

iov_iter_bvec() initialises iterators well, no need to pre-zero it
beforehand as done in fd_execute_rw_aio(). Compilers can't optimise it out
and generate extra code for that (confirmed with assembly).

Link: https://lore.kernel.org/r/34cd22d6cec046e3adf402accb1453cc255b9042.1610207523.git.asml.silence@gmail.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
diff 6b9a44d0 Sun Feb 22 09:48:41 MST 2015 Christoph Hellwig <hch@lst.de> target: use vfs_iter_read/write in fd_do_rw

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff 6f23ac8a Sun Oct 07 08:55:53 MDT 2012 Christoph Hellwig <hch@infradead.org> target: provide generic sbc device type/revision helpers

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff 6bb35e00 Mon Apr 23 09:35:33 MDT 2012 Christoph Hellwig <hch@infradead.org> target: replace ->execute_task with ->execute_cmd

Make CDB emulation work on commands instead of tasks again as a preparation
of removing tasks completely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff 6d180253 Mon Mar 14 05:05:56 MDT 2011 Jesper Juhl <jj@chaosbits.net> [SCSI] target: Fix match_strdup() memory leaks

match_strdup() dynamically allocates memory and it is the responsabillity
of the caller to free that memory. The following three cases:

drivers/target/target_core_file.c:fd_set_configfs_dev_params()
drivers/target/target_core_iblock.c:iblock_set_configfs_dev_params()
drivers/target/target_core_configfs.c:target_core_dev_pr_store_attr_res_aptpl_metadata()

should be kfree()'ing the allocated memory once it is no longer needed.
It also makes sure to return -ENOMEM if the memory allocation in match_strdup()
should fail. For target_core_configfs.c, this patch adds kfree()'s around
Opt_initiator_fabric, Opt_initiator_node, Opt_initiator_sid, Opt_sa_res_key,
Opt_target_fabric, and Opt_target_node for the Persistent Reservations
Activate Persistence across Target Power Loss (APTPL=1) token parsing.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
/linux-master/fs/
H A Ddirect-io.cdiff d44c4042 Tue Jun 13 15:54:39 MDT 2023 David Howells <dhowells@redhat.com> block: Fix dio_cleanup() to advance the head index

Fix dio_bio_cleanup() to advance the head index into the list of pages past
the pages it has released, as __blockdev_direct_IO() will call it twice if
do_direct_IO() fails.

The issue was causing:

WARNING: CPU: 6 PID: 2220 at mm/gup.c:76 try_get_folio

This can be triggered by setting up a clean pair of UDF filesystems on
loopback devices and running the generic/451 xfstest with them as the
scratch and test partitions. Something like the following:

fallocate /mnt2/udf_scratch -l 1G
fallocate /mnt2/udf_test -l 1G
mknod /dev/lo0 b 7 0
mknod /dev/lo1 b 7 1
losetup lo0 /mnt2/udf_scratch
losetup lo1 /mnt2/udf_test
mkfs -t udf /dev/lo0
mkfs -t udf /dev/lo1
cd xfstests
./check generic/451

with xfstests configured by putting the following into local.config:

export FSTYP=udf
export DISABLE_UDF_TEST=1
export TEST_DEV=/dev/lo1
export TEST_DIR=/xfstest.test
export SCRATCH_DEV=/dev/lo0
export SCRATCH_MNT=/xfstest.scratch

Fixes: 1ccf164ec866 ("block: Use iov_iter_extract_pages() and page pinning in direct-io.c")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202306120931.a9606b88-oliver.sang@intel.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@infradead.org>
cc: David Hildenbrand <david@redhat.com>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Jens Axboe <axboe@kernel.dk>
cc: Al Viro <viro@zeniv.linux.org.uk>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jan Kara <jack@suse.cz>
cc: Jeff Layton <jlayton@kernel.org>
cc: Jason Gunthorpe <jgg@nvidia.com>
cc: Logan Gunthorpe <logang@deltatee.com>
cc: Hillf Danton <hdanton@sina.com>
cc: Christian Brauner <brauner@kernel.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: linux-fsdevel@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-kernel@vger.kernel.org
cc: linux-mm@kvack.org
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1193485.1686693279@warthog.procyon.org.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 118f3663 Thu Sep 15 03:42:00 MDT 2022 Christoph Hellwig <hch@lst.de> block: remove PSI accounting from the bio layer

PSI accounting is now done by the VM code, where it should have been
since the beginning.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220915094200.139713-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff c70d868f Mon Oct 14 15:12:11 MDT 2019 Randy Dunlap <rdunlap@infradead.org> fs/direct-io.c: fix kernel-doc warning

Fix kernel-doc warning in fs/direct-io.c:

fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'

Also, don't mark this function as having kernel-doc notation since it is
not exported.

Link: http://lkml.kernel.org/r/97908511-4328-4a56-17fe-f43a1d7aa470@infradead.org
Fixes: 6d544bb4d901 ("dio: centralize completion in dio_complete()")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Zach Brown <zab@zabbo.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff 6dc4f100 Fri Feb 15 04:13:19 MST 2019 Ming Lei <ming.lei@redhat.com> block: allow bio_for_each_segment_all() to iterate over multi-page bvec

This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6aa7de05 Mon Oct 23 15:07:29 MDT 2017 Mark Rutland <mark.rutland@arm.com> locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
diff 9b81c842 Mon Aug 10 17:05:18 MDT 2015 Sasha Levin <sasha.levin@oracle.com> block: don't access bio->bi_error after bio_put()

Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
such as:

[521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
[521120.720638] Read of size 4 by task mount.ocfs2/9644
[521120.721212] =============================================================================
[521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
[521120.722968] -----------------------------------------------------------------------------
[521120.722968]
[521120.723915] Disabling lock debugging due to kernel taint
[521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
[521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
[521120.726037]
[521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............
[521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
[521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................
[521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................
[521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6....
[521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..<....
[521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 ................
[521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G B 4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420
[521120.744274] ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0
[521120.745025] ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00
[521120.745908] ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854
[521120.747063] Call Trace:
[521120.747520] dump_stack (lib/dump_stack.c:52)
[521120.748053] print_trailer (mm/slub.c:653)
[521120.748582] object_err (mm/slub.c:660)
[521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194)
[521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250)
[521120.753580] dio_bio_complete (fs/direct-io.c:478)
[521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291)
[521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322)
[521120.761658] blkdev_direct_IO (fs/block_dev.c:162)
[521120.762993] generic_file_read_iter (mm/filemap.c:1738)
[521120.767405] blkdev_read_iter (fs/block_dev.c:1649)
[521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
[521120.772126] vfs_read (fs/read_write.c:454)
[521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
[521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[521120.777375] Memory state around the buggy address:
[521120.778118] ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.779211] ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.781465] ^
[521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[521120.784818] ==================================================================

This patch fixes a few of those places that I caught while auditing the patch, but the
original patch should be audited further for more occurences of this issue since I'm
not too familiar with the code.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
diff 9b81c842 Mon Aug 10 17:05:18 MDT 2015 Sasha Levin <sasha.levin@oracle.com> block: don't access bio->bi_error after bio_put()

Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
such as:

[521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
[521120.720638] Read of size 4 by task mount.ocfs2/9644
[521120.721212] =============================================================================
[521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
[521120.722968] -----------------------------------------------------------------------------
[521120.722968]
[521120.723915] Disabling lock debugging due to kernel taint
[521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
[521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
[521120.726037]
[521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff ...6............
[521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
[521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00 ................
[521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff ................
[521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff ...........6....
[521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff .s".....@..<....
[521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00 ................
[521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G B 4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420
[521120.744274] ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0
[521120.745025] ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00
[521120.745908] ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854
[521120.747063] Call Trace:
[521120.747520] dump_stack (lib/dump_stack.c:52)
[521120.748053] print_trailer (mm/slub.c:653)
[521120.748582] object_err (mm/slub.c:660)
[521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194)
[521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250)
[521120.753580] dio_bio_complete (fs/direct-io.c:478)
[521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291)
[521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322)
[521120.761658] blkdev_direct_IO (fs/block_dev.c:162)
[521120.762993] generic_file_read_iter (mm/filemap.c:1738)
[521120.767405] blkdev_read_iter (fs/block_dev.c:1649)
[521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
[521120.772126] vfs_read (fs/read_write.c:454)
[521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
[521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[521120.777375] Memory state around the buggy address:
[521120.778118] ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.779211] ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.781465] ^
[521120.782083] ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.783717] ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[521120.784818] ==================================================================

This patch fixes a few of those places that I caught while auditing the patch, but the
original patch should be audited further for more occurences of this issue since I'm
not too familiar with the code.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
diff 6fcc5420 Sat Jul 19 15:09:04 MDT 2014 Boaz Harrosh <boaz@plexistor.com> direct-io: fix uninitialized warning in do_direct_IO()

The following warnings:

fs/direct-io.c: In function ‘__blockdev_direct_IO’:
fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/direct-io.c:913:16: note: ‘to’ was declared here
fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/direct-io.c:913:10: note: ‘from’ was declared here

are false positive because dio_get_page() either fails, or sets both
'from' and 'to'.

Paul Bolle said ...
Maybe it's better to move initializing "to" and "from" out of
dio_get_page(). That _might_ make it easier for both the the reader and
the compiler to understand what's going on. Something like this:

Christoph Hellwig said ...
The fix of moving the code definitively looks nicer, while I think
uninitialized_var is horrible wart that won't get anywhere near my code.

Boaz Harrosh: I agree with Christoph and Paul

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
H A Daio.cdiff 6ac805d1 Mon Dec 04 10:47:49 MST 2023 Jens Axboe <axboe@kernel.dk> iov_iter: remove unused 'iov' argument from import_single_range()

It is entirely unused, just get rid of it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20231204174827.1258875-2-axboe@kernel.dk
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 8c3cfa80 Thu Aug 17 08:13:35 MDT 2023 Amir Goldstein <amir73il@gmail.com> aio: use kiocb_{start,end}_write() helpers

Use helpers instead of the open coded dance to silence lockdep warnings.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <20230817141337.1025891-6-amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 50252e4b Wed Dec 08 18:04:55 MST 2021 Eric Biggers <ebiggers@google.com> aio: fix use-after-free due to missing POLLFREE handling

signalfd_poll() and binder_poll() are special in that they use a
waitqueue whose lifetime is the current task, rather than the struct
file as is normally the case. This is okay for blocking polls, since a
blocking poll occurs within one task; however, non-blocking polls
require another solution. This solution is for the queue to be cleared
before it is freed, by sending a POLLFREE notification to all waiters.

Unfortunately, only eventpoll handles POLLFREE. A second type of
non-blocking poll, aio poll, was added in kernel v4.18, and it doesn't
handle POLLFREE. This allows a use-after-free to occur if a signalfd or
binder fd is polled with aio poll, and the waitqueue gets freed.

Fix this by making aio poll handle POLLFREE.

A patch by Ramji Jiyani <ramjiyani@google.com>
(https://lore.kernel.org/r/20211027011834.2497484-1-ramjiyani@google.com)
tried to do this by making aio_poll_wake() always complete the request
inline if POLLFREE is seen. However, that solution had two bugs.
First, it introduced a deadlock, as it unconditionally locked the aio
context while holding the waitqueue lock, which inverts the normal
locking order. Second, it didn't consider that POLLFREE notifications
are missed while the request has been temporarily de-queued.

The second problem was solved by my previous patch. This patch then
properly fixes the use-after-free by handling POLLFREE in a
deadlock-free way. It does this by taking advantage of the fact that
freeing of the waitqueue is RCU-delayed, similar to what eventpoll does.

Fixes: 2c14fa838cbe ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.18+
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6af1c849 Thu Apr 04 02:44:05 MDT 2019 Wei Yongjun <weiyongjun1@huawei.com> aio: use kmem_cache_free() instead of kfree()

memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Fixes: fa0ca2aee3be ("deal with get_reqs_available() in aio_get_req() itself")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff 84c4e1f8 Sun Mar 03 15:23:33 MST 2019 Linus Torvalds <torvalds@linux-foundation.org> aio: simplify - and fix - fget/fput for io_submit()

Al Viro root-caused a race where the IOCB_CMD_POLL handling of
fget/fput() could cause us to access the file pointer after it had
already been freed:

"In more details - normally IOCB_CMD_POLL handling looks so:

1) io_submit(2) allocates aio_kiocb instance and passes it to
aio_poll()

2) aio_poll() resolves the descriptor to struct file by req->file =
fget(iocb->aio_fildes)

3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
aio_kiocb to 2 (bumps by 1, that is).

4) aio_poll() calls vfs_poll(). After sanity checks (basically,
"poll_wait() had been called and only once") it locks the queue.
That's what the extra reference to iocb had been for - we know we
can safely access it.

5) With queue locked, we check if ->woken has already been set to
true (by aio_poll_wake()) and, if it had been, we unlock the
queue, drop a reference to aio_kiocb and bugger off - at that
point it's a responsibility to aio_poll_wake() and the stuff
called/scheduled by it. That code will drop the reference to file
in req->file, along with the other reference to our aio_kiocb.

6) otherwise, we see whether we need to wait. If we do, we unlock the
queue, drop one reference to aio_kiocb and go away - eventual
wakeup (or cancel) will deal with the reference to file and with
the other reference to aio_kiocb

7) otherwise we remove ourselves from waitqueue (still under the
queue lock), so that wakeup won't get us. No async activity will
be happening, so we can safely drop req->file and iocb ourselves.

If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
won't get freed under us, so we can do all the checks and locking
safely. And we don't touch ->file if we detect that case.

However, vfs_poll() most certainly *does* touch the file it had been
given. So wakeup coming while we are still in ->poll() might end up
doing fput() on that file. That case is not too rare, and usually we
are saved by the still present reference from descriptor table - that
fput() is not the final one.

But if another thread closes that descriptor right after our fget()
and wakeup does happen before ->poll() returns, we are in trouble -
final fput() done while we are in the middle of a method:

Al also wrote a patch to take an extra reference to the file descriptor
to fix this, but I instead suggested we just streamline the whole file
pointer handling by submit_io() so that the generic aio submission code
simply keeps the file pointer around until the aio has completed.

Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL")
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff 6aa7de05 Mon Oct 23 15:07:29 MDT 2017 Mark Rutland <mark.rutland@arm.com> locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()

Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
diff 2916ecc0 Fri Sep 08 17:12:06 MDT 2017 Jérôme Glisse <jglisse@redhat.com> mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY

Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff 2916ecc0 Fri Sep 08 17:12:06 MDT 2017 Jérôme Glisse <jglisse@redhat.com> mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY

Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.

This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).

No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.

Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.

Usual flow is:

For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}

The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pages

To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.

Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.

Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/linux-master/fs/nfs/
H A Ddirect.cdiff a6b5a28e Sat Nov 14 11:43:54 MST 2020 Dave Wysochanski <dwysocha@redhat.com> nfs: Convert to new fscache volume/cookie API

Change the nfs filesystem to support fscache's indexing rewrite and
reenable caching in nfs.

The following changes have been made:

(1) The fscache_netfs struct is no more, and there's no need to register
the filesystem as a whole.

(2) The session cookie is now an fscache_volume cookie, allocated with
fscache_acquire_volume(). That takes three parameters: a string
representing the "volume" in the index, a string naming the cache to
use (or NULL) and a u64 that conveys coherency metadata for the
volume.

For nfs, I've made it render the volume name string as:

"nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]"

(3) The fscache_cookie_def is no more and needed information is passed
directly to fscache_acquire_cookie(). The cache no longer calls back
into the filesystem, but rather metadata changes are indicated at
other times.

fscache_acquire_cookie() is passed the same keying and coherency
information as before.

(4) fscache_enable/disable_cookie() have been removed.

Call fscache_use_cookie() and fscache_unuse_cookie() when a file is
opened or closed to prevent a cache file from being culled and to keep
resources to hand that are needed to do I/O.

If a file is opened for writing, we invalidate it with
FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache,
thereby making it cease caching until all currently open files are
closed. This should give the same behaviour as the uptream code.
Making the cache store local modifications isn't straightforward for
NFS, so that's left for future patches.

(5) fscache_invalidate() now needs to be given uptodate auxiliary data and
a file size. It also takes a flag to indicate if this was due to a
DIO write.

(6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file
to which a DIO write is made.

(7) Call fscache_note_page_release() from nfs_release_page().

(8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
PG_fscache to be cleared.

(9) The functions to read and write data to/from the cache are stubbed out
pending a conversion to use netfslib.

Changes
=======
ver #3:
- Added missing =n fallback for nfs_fscache_release_file()[1][2].

ver #2:
- Use gfpflags_allow_blocking() rather than using flag directly.
- fscache_acquire_volume() now returns errors.
- Remove NFS_INO_FSCACHE as it's no longer used.
- Need to unuse a cookie on file-release, not inode-clear.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna.schumaker@netapp.com>
cc: linux-nfs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/202112100804.nksO8K4u-lkp@intel.com/ [1]
Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2]
Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021575950.640689.12069642327533368467.stgit@warthog.procyon.org.uk/ # v4
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6f673763 Mon Mar 16 05:33:52 MDT 2015 Omar Sandoval <osandov@osandov.com> direct_IO: use iov_iter_rw() instead of rw everywhere

The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff 6cccbb6f Tue Sep 16 15:35:51 MDT 2014 Weston Andros Adamson <dros@primarydata.com> nfs: rename pgio header ds_idx to ds_commit_idx

'ds_commit_idx' is a better name - it is used to select the right
commit bucket for pnfs.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
diff 6de1472f Mon Sep 16 08:53:17 MDT 2013 Al Viro <viro@zeniv.linux.org.uk> nfs: use %p[dD] instead of open-coded (and often racy) equivalents

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff 1385b811 Fri May 04 11:54:24 MDT 2012 Trond Myklebust <Trond.Myklebust@netapp.com> NFS: Fix sparse warnings

Fix the following sparse warnings:

fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was
not declared. Should it be static?
fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function
'nfs_readhdr_alloc'
fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg'
was not declared. Should it be static?

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
diff 6d74743b Mon Apr 30 11:27:31 MDT 2012 Trond Myklebust <Trond.Myklebust@netapp.com> NFS: Simplify O_DIRECT page referencing

The O_DIRECT code shouldn't need to hold 2 references to each page. The
reference held by the struct nfs_page should suffice.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
diff 6c75dc0d Fri Apr 20 12:47:47 MDT 2012 Fred Isaman <iisaman@netapp.com> NFS: merge _full and _partial write rpc_ops

Decouple nfs_pgio_header and nfs_write_data, and have (possibly
multiple) nfs_write_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_write_header as a way to preallocate a single
nfs_write_data with the nfs_pgio_header. The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_write_error. In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
diff 5a0e3ad6 Wed Mar 24 02:04:11 MDT 2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
diff 5a0e3ad6 Wed Mar 24 02:04:11 MDT 2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
/linux-master/drivers/usb/gadget/function/
H A Df_fs.cdiff efb6b535 Thu May 25 03:28:54 MDT 2023 Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com> usb: gadget: f_fs: Add unbind event before functionfs_unbind

While exercising the unbind path, with the current implementation
the functionfs_unbind would be calling which waits for the ffs->mutex
to be available, however within the same time ffs_ep0_read is invoked
& if no setup packets are pending, it will invoke function
wait_event_interruptible_exclusive_locked_irq which by definition waits
for the ev.count to be increased inside the same mutex for which
functionfs_unbind is waiting.
This creates deadlock situation because the functionfs_unbind won't
get the lock until ev.count is increased which can only happen if
the caller ffs_func_unbind can proceed further.

Following is the illustration:

CPU1 CPU2

ffs_func_unbind() ffs_ep0_read()
mutex_lock(ffs->mutex)
wait_event(ffs->ev.count)
functionfs_unbind()
mutex_lock(ffs->mutex)
mutex_unlock(ffs->mutex)

ffs_event_add()

<deadlock>

Fix this by moving the event unbind before functionfs_unbind
to ensure the ev.count is incrased properly.

Fixes: 6a19da111057 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait")
Cc: stable <stable@kernel.org>
Signed-off-by: Uttkarsh Aggarwal <quic_uaggarwa@quicinc.com>
Link: https://lore.kernel.org/r/20230525092854.7992-1-quic_uaggarwa@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 921deb9d Tue Jan 24 02:11:49 MST 2023 Udipto Goswami <quic_ugoswami@quicinc.com> usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait

__ffs_ep0_queue_wait executes holding the spinlock of &ffs->ev.waitq.lock
and unlocks it after the assignments to usb_request are done.
However in the code if the request is already NULL we bail out returning
-EINVAL but never unlocked the spinlock.

Fix this by adding spin_unlock_irq &ffs->ev.waitq.lock before returning.

Fixes: 6a19da111057 ("usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait")
Reviewed-by: John Keeping <john@metanate.com>
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
Link: https://lore.kernel.org/r/20230124091149.18647-1-quic_ugoswami@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 6a19da11 Wed Dec 14 22:29:05 MST 2022 Udipto Goswami <quic_ugoswami@quicinc.com> usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait

While performing fast composition switch, there is a possibility that the
process of ffs_ep0_write/ffs_ep0_read get into a race condition
due to ep0req being freed up from functionfs_unbind.

Consider the scenario that the ffs_ep0_write calls the ffs_ep0_queue_wait
by taking a lock &ffs->ev.waitq.lock. However, the functionfs_unbind isn't
bounded so it can go ahead and mark the ep0req to NULL, and since there
is no NULL check in ffs_ep0_queue_wait we will end up in use-after-free.

Fix this by making a serialized execution between the two functions using
a mutex_lock(ffs->mutex).

Fixes: ddf8abd25994 ("USB: f_fs: the FunctionFS driver")
Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com>
Tested-by: Krishna Kurapati <quic_kriskura@quicinc.com>
Link: https://lore.kernel.org/r/20221215052906.8993-2-quic_ugoswami@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff ecfbd7b9 Wed Jun 02 23:15:07 MDT 2021 Andrew Gabbasov <andrew_gabbasov@mentor.com> usb: gadget: f_fs: Fix setting of device and driver data cross-references

FunctionFS device structure 'struct ffs_dev' and driver data structure
'struct ffs_data' are bound to each other with cross-reference pointers
'ffs_data->private_data' and 'ffs_dev->ffs_data'. While the first one
is supposed to be valid through the whole life of 'struct ffs_data'
(and while 'struct ffs_dev' exists non-freed), the second one is cleared
in 'ffs_closed()' (called from 'ffs_data_reset()' or the last
'ffs_data_put()'). This can be called several times, alternating in
different order with 'ffs_free_inst()', that, if possible, clears
the other cross-reference.

As a result, different cases of these calls order may leave stale
cross-reference pointers, used when the pointed structure is already
freed. Even if it occasionally doesn't cause kernel crash, this error
is reported by KASAN-enabled kernel configuration.

For example, the case [last 'ffs_data_put()' - 'ffs_free_inst()'] was
fixed by commit cdafb6d8b8da ("usb: gadget: f_fs: Fix use-after-free in
ffs_free_inst").

The other case ['ffs_data_reset()' - 'ffs_free_inst()' - 'ffs_data_put()']
now causes KASAN reported error [1], when 'ffs_data_reset()' clears
'ffs_dev->ffs_data', then 'ffs_free_inst()' frees the 'struct ffs_dev',
but can't clear 'ffs_data->private_data', which is then accessed
in 'ffs_closed()' called from 'ffs_data_put()'. This happens since
'ffs_dev->ffs_data' reference is cleared too early.

Moreover, one more use case, when 'ffs_free_inst()' is called immediately
after mounting FunctionFS device (that is before the descriptors are
written and 'ffs_ready()' is called), and then 'ffs_data_reset()'
or 'ffs_data_put()' is called from accessing "ep0" file or unmounting
the device. This causes KASAN error report like [2], since
'ffs_dev->ffs_data' is not yet set when 'ffs_free_inst()' can't properly
clear 'ffs_data->private_data', that is later accessed to freed structure.

Fix these (and may be other) cases of stale pointers access by moving
setting and clearing of the mentioned cross-references to the single
places, setting both of them when 'struct ffs_data' is created and
bound to 'struct ffs_dev', and clearing both of them when one of the
structures is destroyed. It seems convenient to make this pointer
initialization and structures binding in 'ffs_acquire_dev()' and
make pointers clearing in 'ffs_release_dev()'. This required some
changes in these functions parameters and return types.

Also, 'ffs_release_dev()' calling requires some cleanup, fixing minor
issues, like (1) 'ffs_release_dev()' is not called if 'ffs_free_inst()'
is called without unmounting the device, and "release_dev" callback
is not called at all, or (2) "release_dev" callback is called before
"ffs_closed" callback on unmounting, which seems to be not correctly
nested with "acquire_dev" and "ffs_ready" callbacks.
Make this cleanup togther with other mentioned 'ffs_release_dev()' changes.

[1]
==================================================================
root@rcar-gen3:~# mkdir /dev/cfs
root@rcar-gen3:~# mkdir /dev/ffs
root@rcar-gen3:~# modprobe libcomposite
root@rcar-gen3:~# mount -t configfs none /dev/cfs
root@rcar-gen3:~# mkdir /dev/cfs/usb_gadget/g1
root@rcar-gen3:~# mkdir /dev/cfs/usb_gadget/g1/functions/ffs.ffs
[ 64.340664] file system registered
root@rcar-gen3:~# mount -t functionfs ffs /dev/ffs
root@rcar-gen3:~# cd /dev/ffs
root@rcar-gen3:/dev/ffs# /home/root/ffs-test
ffs-test: info: ep0: writing descriptors (in v2 format)
[ 83.181442] read descriptors
[ 83.186085] read strings
ffs-test: info: ep0: writing strings
ffs-test: dbg: ep1: starting
ffs-test: dbg: ep2: starting
ffs-test: info: ep1: starts
ffs-test: info: ep2: starts
ffs-test: info: ep0: starts

^C
root@rcar-gen3:/dev/ffs# cd /home/root/
root@rcar-gen3:~# rmdir /dev/cfs/usb_gadget/g1/functions/ffs.ffs
[ 98.935061] unloading
root@rcar-gen3:~# umount /dev/ffs
[ 102.734301] ==================================================================
[ 102.742059] BUG: KASAN: use-after-free in ffs_release_dev+0x64/0xa8 [usb_f_fs]
[ 102.749683] Write of size 1 at addr ffff0004d46ff549 by task umount/2997
[ 102.756709]
[ 102.758311] CPU: 0 PID: 2997 Comm: umount Not tainted 5.13.0-rc4+ #8
[ 102.764971] Hardware name: Renesas Salvator-X board based on r8a77951 (DT)
[ 102.772179] Call trace:
[ 102.774779] dump_backtrace+0x0/0x330
[ 102.778653] show_stack+0x20/0x2c
[ 102.782152] dump_stack+0x11c/0x1ac
[ 102.785833] print_address_description.constprop.0+0x30/0x274
[ 102.791862] kasan_report+0x14c/0x1c8
[ 102.795719] __asan_report_store1_noabort+0x34/0x58
[ 102.800840] ffs_release_dev+0x64/0xa8 [usb_f_fs]
[ 102.805801] ffs_fs_kill_sb+0x50/0x84 [usb_f_fs]
[ 102.810663] deactivate_locked_super+0xa0/0xf0
[ 102.815339] deactivate_super+0x98/0xac
[ 102.819378] cleanup_mnt+0xd0/0x1b0
[ 102.823057] __cleanup_mnt+0x1c/0x28
[ 102.826823] task_work_run+0x104/0x180
[ 102.830774] do_notify_resume+0x458/0x14e0
[ 102.835083] work_pending+0xc/0x5f8
[ 102.838762]
[ 102.840357] Allocated by task 2988:
[ 102.844032] kasan_save_stack+0x28/0x58
[ 102.848071] kasan_set_track+0x28/0x3c
[ 102.852016] ____kasan_kmalloc+0x84/0x9c
[ 102.856142] __kasan_kmalloc+0x10/0x1c
[ 102.860088] __kmalloc+0x214/0x2f8
[ 102.863678] kzalloc.constprop.0+0x14/0x20 [usb_f_fs]
[ 102.868990] ffs_alloc_inst+0x8c/0x208 [usb_f_fs]
[ 102.873942] try_get_usb_function_instance+0xf0/0x164 [libcomposite]
[ 102.880629] usb_get_function_instance+0x64/0x68 [libcomposite]
[ 102.886858] function_make+0x128/0x1ec [libcomposite]
[ 102.892185] configfs_mkdir+0x330/0x590 [configfs]
[ 102.897245] vfs_mkdir+0x12c/0x1bc
[ 102.900835] do_mkdirat+0x180/0x1d0
[ 102.904513] __arm64_sys_mkdirat+0x80/0x94
[ 102.908822] invoke_syscall+0xf8/0x25c
[ 102.912772] el0_svc_common.constprop.0+0x150/0x1a0
[ 102.917891] do_el0_svc+0xa0/0xd4
[ 102.921386] el0_svc+0x24/0x34
[ 102.924613] el0_sync_handler+0xcc/0x154
[ 102.928743] el0_sync+0x198/0x1c0
[ 102.932238]
[ 102.933832] Freed by task 2996:
[ 102.937144] kasan_save_stack+0x28/0x58
[ 102.941181] kasan_set_track+0x28/0x3c
[ 102.945128] kasan_set_free_info+0x28/0x4c
[ 102.949435] ____kasan_slab_free+0x104/0x118
[ 102.953921] __kasan_slab_free+0x18/0x24
[ 102.958047] slab_free_freelist_hook+0x148/0x1f0
[ 102.962897] kfree+0x318/0x440
[ 102.966123] ffs_free_inst+0x164/0x2d8 [usb_f_fs]
[ 102.971075] usb_put_function_instance+0x84/0xa4 [libcomposite]
[ 102.977302] ffs_attr_release+0x18/0x24 [usb_f_fs]
[ 102.982344] config_item_put+0x140/0x1a4 [configfs]
[ 102.987486] configfs_rmdir+0x3fc/0x518 [configfs]
[ 102.992535] vfs_rmdir+0x114/0x234
[ 102.996122] do_rmdir+0x274/0x2b0
[ 102.999617] __arm64_sys_unlinkat+0x94/0xc8
[ 103.004015] invoke_syscall+0xf8/0x25c
[ 103.007961] el0_svc_common.constprop.0+0x150/0x1a0
[ 103.013080] do_el0_svc+0xa0/0xd4
[ 103.016575] el0_svc+0x24/0x34
[ 103.019801] el0_sync_handler+0xcc/0x154
[ 103.023930] el0_sync+0x198/0x1c0
[ 103.027426]
[ 103.029020] The buggy address belongs to the object at ffff0004d46ff500
[ 103.029020] which belongs to the cache kmalloc-128 of size 128
[ 103.042079] The buggy address is located 73 bytes inside of
[ 103.042079] 128-byte region [ffff0004d46ff500, ffff0004d46ff580)
[ 103.054236] The buggy address belongs to the page:
[ 103.059262] page:0000000021aa849b refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0004d46fee00 pfn:0x5146fe
[ 103.070437] head:0000000021aa849b order:1 compound_mapcount:0
[ 103.076456] flags: 0x8000000000010200(slab|head|zone=2)
[ 103.081948] raw: 8000000000010200 fffffc0013521a80 0000000d0000000d ffff0004c0002300
[ 103.090052] raw: ffff0004d46fee00 000000008020001e 00000001ffffffff 0000000000000000
[ 103.098150] page dumped because: kasan: bad access detected
[ 103.103985]
[ 103.105578] Memory state around the buggy address:
[ 103.110602] ffff0004d46ff400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 103.118161] ffff0004d46ff480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 103.125726] >ffff0004d46ff500: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 103.133284] ^
[ 103.139120] ffff0004d46ff580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 103.146679] ffff0004d46ff600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 103.154238] ==================================================================
[ 103.161792] Disabling lock debugging due to kernel taint
[ 103.167319] Unable to handle kernel paging request at virtual address 0037801d6000018e
[ 103.175406] Mem abort info:
[ 103.178457] ESR = 0x96000004
[ 103.181609] EC = 0x25: DABT (current EL), IL = 32 bits
[ 103.187020] SET = 0, FnV = 0
[ 103.190185] EA = 0, S1PTW = 0
[ 103.193417] Data abort info:
[ 103.196385] ISV = 0, ISS = 0x00000004
[ 103.200315] CM = 0, WnR = 0
[ 103.203366] [0037801d6000018e] address between user and kernel address ranges
[ 103.210611] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 103.216231] Modules linked in: usb_f_fs libcomposite configfs ath9k_htc led_class mac80211 libarc4 ath9k_common ath9k_hw ath cfg80211 aes_ce_blk sata_rc4
[ 103.259233] CPU: 0 PID: 2997 Comm: umount Tainted: G B 5.13.0-rc4+ #8
[ 103.267031] Hardware name: Renesas Salvator-X board based on r8a77951 (DT)
[ 103.273951] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 103.280001] pc : ffs_data_clear+0x138/0x370 [usb_f_fs]
[ 103.285197] lr : ffs_data_clear+0x124/0x370 [usb_f_fs]
[ 103.290385] sp : ffff800014777a80
[ 103.293725] x29: ffff800014777a80 x28: ffff0004d7649c80 x27: 0000000000000000
[ 103.300931] x26: ffff800014777fb0 x25: ffff60009aec9394 x24: ffff0004d7649ca4
[ 103.308136] x23: 1fffe0009a3d063a x22: dfff800000000000 x21: ffff0004d1e831d0
[ 103.315340] x20: e1c000eb00000bb4 x19: ffff0004d1e83000 x18: 0000000000000000
[ 103.322545] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 103.329748] x14: 0720072007200720 x13: 0720072007200720 x12: 1ffff000012ef658
[ 103.336952] x11: ffff7000012ef658 x10: 0720072007200720 x9 : ffff800011322648
[ 103.344157] x8 : ffff800014777818 x7 : ffff80000977b2c7 x6 : 0000000000000000
[ 103.351359] x5 : 0000000000000001 x4 : ffff7000012ef659 x3 : 0000000000000001
[ 103.358562] x2 : 0000000000000000 x1 : 1c38001d6000018e x0 : e1c000eb00000c70
[ 103.365766] Call trace:
[ 103.368235] ffs_data_clear+0x138/0x370 [usb_f_fs]
[ 103.373076] ffs_data_reset+0x20/0x304 [usb_f_fs]
[ 103.377829] ffs_data_closed+0x1ec/0x244 [usb_f_fs]
[ 103.382755] ffs_fs_kill_sb+0x70/0x84 [usb_f_fs]
[ 103.387420] deactivate_locked_super+0xa0/0xf0
[ 103.391905] deactivate_super+0x98/0xac
[ 103.395776] cleanup_mnt+0xd0/0x1b0
[ 103.399299] __cleanup_mnt+0x1c/0x28
[ 103.402906] task_work_run+0x104/0x180
[ 103.406691] do_notify_resume+0x458/0x14e0
[ 103.410823] work_pending+0xc/0x5f8
[ 103.414351] Code: b4000a54 9102f280 12000802 d343fc01 (38f66821)
[ 103.420490] ---[ end trace 57b43a50e8244f57 ]---
Segmentation fault
root@rcar-gen3:~#
==================================================================

[2]
==================================================================
root@rcar-gen3:~# mkdir /dev/ffs
root@rcar-gen3:~# modprobe libcomposite
root@rcar-gen3:~#
root@rcar-gen3:~# mount -t configfs none /dev/cfs
root@rcar-gen3:~# mkdir /dev/cfs/usb_gadget/g1
root@rcar-gen3:~# mkdir /dev/cfs/usb_gadget/g1/functions/ffs.ffs
[ 54.766480] file system registered
root@rcar-gen3:~# mount -t functionfs ffs /dev/ffs
root@rcar-gen3:~# rmdir /dev/cfs/usb_gadget/g1/functions/ffs.ffs
[ 63.197597] unloading
root@rcar-gen3:~# cat /dev/ffs/ep0
cat: read error:[ 67.213506] ==================================================================
[ 67.222095] BUG: KASAN: use-after-free in ffs_data_clear+0x70/0x370 [usb_f_fs]
[ 67.229699] Write of size 1 at addr ffff0004c26e974a by task cat/2994
[ 67.236446]
[ 67.238045] CPU: 0 PID: 2994 Comm: cat Not tainted 5.13.0-rc4+ #8
[ 67.244431] Hardware name: Renesas Salvator-X board based on r8a77951 (DT)
[ 67.251624] Call trace:
[ 67.254212] dump_backtrace+0x0/0x330
[ 67.258081] show_stack+0x20/0x2c
[ 67.261579] dump_stack+0x11c/0x1ac
[ 67.265260] print_address_description.constprop.0+0x30/0x274
[ 67.271286] kasan_report+0x14c/0x1c8
[ 67.275143] __asan_report_store1_noabort+0x34/0x58
[ 67.280265] ffs_data_clear+0x70/0x370 [usb_f_fs]
[ 67.285220] ffs_data_reset+0x20/0x304 [usb_f_fs]
[ 67.290172] ffs_data_closed+0x240/0x244 [usb_f_fs]
[ 67.295305] ffs_ep0_release+0x40/0x54 [usb_f_fs]
[ 67.300256] __fput+0x304/0x580
[ 67.303576] ____fput+0x18/0x24
[ 67.306893] task_work_run+0x104/0x180
[ 67.310846] do_notify_resume+0x458/0x14e0
[ 67.315154] work_pending+0xc/0x5f8
[ 67.318834]
[ 67.320429] Allocated by task 2988:
[ 67.324105] kasan_save_stack+0x28/0x58
[ 67.328144] kasan_set_track+0x28/0x3c
[ 67.332090] ____kasan_kmalloc+0x84/0x9c
[ 67.336217] __kasan_kmalloc+0x10/0x1c
[ 67.340163] __kmalloc+0x214/0x2f8
[ 67.343754] kzalloc.constprop.0+0x14/0x20 [usb_f_fs]
[ 67.349066] ffs_alloc_inst+0x8c/0x208 [usb_f_fs]
[ 67.354017] try_get_usb_function_instance+0xf0/0x164 [libcomposite]
[ 67.360705] usb_get_function_instance+0x64/0x68 [libcomposite]
[ 67.366934] function_make+0x128/0x1ec [libcomposite]
[ 67.372260] configfs_mkdir+0x330/0x590 [configfs]
[ 67.377320] vfs_mkdir+0x12c/0x1bc
[ 67.380911] do_mkdirat+0x180/0x1d0
[ 67.384589] __arm64_sys_mkdirat+0x80/0x94
[ 67.388899] invoke_syscall+0xf8/0x25c
[ 67.392850] el0_svc_common.constprop.0+0x150/0x1a0
[ 67.397969] do_el0_svc+0xa0/0xd4
[ 67.401464] el0_svc+0x24/0x34
[ 67.404691] el0_sync_handler+0xcc/0x154
[ 67.408819] el0_sync+0x198/0x1c0
[ 67.412315]
[ 67.413909] Freed by task 2993:
[ 67.417220] kasan_save_stack+0x28/0x58
[ 67.421257] kasan_set_track+0x28/0x3c
[ 67.425204] kasan_set_free_info+0x28/0x4c
[ 67.429513] ____kasan_slab_free+0x104/0x118
[ 67.434001] __kasan_slab_free+0x18/0x24
[ 67.438128] slab_free_freelist_hook+0x148/0x1f0
[ 67.442978] kfree+0x318/0x440
[ 67.446205] ffs_free_inst+0x164/0x2d8 [usb_f_fs]
[ 67.451156] usb_put_function_instance+0x84/0xa4 [libcomposite]
[ 67.457385] ffs_attr_release+0x18/0x24 [usb_f_fs]
[ 67.462428] config_item_put+0x140/0x1a4 [configfs]
[ 67.467570] configfs_rmdir+0x3fc/0x518 [configfs]
[ 67.472626] vfs_rmdir+0x114/0x234
[ 67.476215] do_rmdir+0x274/0x2b0
[ 67.479710] __arm64_sys_unlinkat+0x94/0xc8
[ 67.484108] invoke_syscall+0xf8/0x25c
[ 67.488055] el0_svc_common.constprop.0+0x150/0x1a0
[ 67.493175] do_el0_svc+0xa0/0xd4
[ 67.496671] el0_svc+0x24/0x34
[ 67.499896] el0_sync_handler+0xcc/0x154
[ 67.504024] el0_sync+0x198/0x1c0
[ 67.507520]
[ 67.509114] The buggy address belongs to the object at ffff0004c26e9700
[ 67.509114] which belongs to the cache kmalloc-128 of size 128
[ 67.522171] The buggy address is located 74 bytes inside of
[ 67.522171] 128-byte region [ffff0004c26e9700, ffff0004c26e9780)
[ 67.534328] The buggy address belongs to the page:
[ 67.539355] page:000000003177a217 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5026e8
[ 67.549175] head:000000003177a217 order:1 compound_mapcount:0
[ 67.555195] flags: 0x8000000000010200(slab|head|zone=2)
[ 67.560687] raw: 8000000000010200 fffffc0013037100 0000000c00000002 ffff0004c0002300
[ 67.568791] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 67.576890] page dumped because: kasan: bad access detected
[ 67.582725]
[ 67.584318] Memory state around the buggy address:
[ 67.589343] ffff0004c26e9600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 67.596903] ffff0004c26e9680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 67.604463] >ffff0004c26e9700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 67.612022] ^
[ 67.617860] ffff0004c26e9780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 67.625421] ffff0004c26e9800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 67.632981] ==================================================================
[ 67.640535] Disabling lock debugging due to kernel taint
File descriptor[ 67.646100] Unable to handle kernel paging request at virtual address fabb801d4000018d
in bad state
[ 67.655456] Mem abort info:
[ 67.659619] ESR = 0x96000004
[ 67.662801] EC = 0x25: DABT (current EL), IL = 32 bits
[ 67.668225] SET = 0, FnV = 0
[ 67.671375] EA = 0, S1PTW = 0
[ 67.674613] Data abort info:
[ 67.677587] ISV = 0, ISS = 0x00000004
[ 67.681522] CM = 0, WnR = 0
[ 67.684588] [fabb801d4000018d] address between user and kernel address ranges
[ 67.691849] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 67.697470] Modules linked in: usb_f_fs libcomposite configfs ath9k_htc led_class mac80211 libarc4 ath9k_common ath9k_hw ath cfg80211 aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce gf128mul sha2_ce sha1_ce evdev sata_rcar libata xhci_plat_hcd scsi_mod xhci_hcd rene4
[ 67.740467] CPU: 0 PID: 2994 Comm: cat Tainted: G B 5.13.0-rc4+ #8
[ 67.748005] Hardware name: Renesas Salvator-X board based on r8a77951 (DT)
[ 67.754924] pstate: 00000005 (nzcv daif -PAN -UAO -TCO BTYPE=--)
[ 67.760974] pc : ffs_data_clear+0x138/0x370 [usb_f_fs]
[ 67.766178] lr : ffs_data_clear+0x124/0x370 [usb_f_fs]
[ 67.771365] sp : ffff800014767ad0
[ 67.774706] x29: ffff800014767ad0 x28: ffff800009cf91c0 x27: ffff0004c54861a0
[ 67.781913] x26: ffff0004dc90b288 x25: 1fffe00099ec10f5 x24: 00000000000a801d
[ 67.789118] x23: 1fffe00099f6953a x22: dfff800000000000 x21: ffff0004cfb4a9d0
[ 67.796322] x20: d5e000ea00000bb1 x19: ffff0004cfb4a800 x18: 0000000000000000
[ 67.803526] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 67.810730] x14: 0720072007200720 x13: 0720072007200720 x12: 1ffff000028ecefa
[ 67.817934] x11: ffff7000028ecefa x10: 0720072007200720 x9 : ffff80001132c014
[ 67.825137] x8 : ffff8000147677d8 x7 : ffff8000147677d7 x6 : 0000000000000000
[ 67.832341] x5 : 0000000000000001 x4 : ffff7000028ecefb x3 : 0000000000000001
[ 67.839544] x2 : 0000000000000005 x1 : 1abc001d4000018d x0 : d5e000ea00000c6d
[ 67.846748] Call trace:
[ 67.849218] ffs_data_clear+0x138/0x370 [usb_f_fs]
[ 67.854058] ffs_data_reset+0x20/0x304 [usb_f_fs]
[ 67.858810] ffs_data_closed+0x240/0x244 [usb_f_fs]
[ 67.863736] ffs_ep0_release+0x40/0x54 [usb_f_fs]
[ 67.868488] __fput+0x304/0x580
[ 67.871665] ____fput+0x18/0x24
[ 67.874837] task_work_run+0x104/0x180
[ 67.878622] do_notify_resume+0x458/0x14e0
[ 67.882754] work_pending+0xc/0x5f8
[ 67.886282] Code: b4000a54 9102f280 12000802 d343fc01 (38f66821)
[ 67.892422] ---[ end trace 6d7cedf53d7abbea ]---
Segmentation fault
root@rcar-gen3:~#
==================================================================

Fixes: 4b187fceec3c ("usb: gadget: FunctionFS: add devices management code")
Fixes: 3262ad824307 ("usb: gadget: f_fs: Stop ffs_closed NULL pointer dereference")
Fixes: cdafb6d8b8da ("usb: gadget: f_fs: Fix use-after-free in ffs_free_inst")
Reported-by: Bhuvanesh Surachari <bhuvanesh_surachari@mentor.com>
Tested-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Reviewed-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Link: https://lore.kernel.org/r/20210603171507.22514-1-andrew_gabbasov@mentor.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 6fc1db5e Fri May 21 18:44:21 MDT 2021 Wesley Cheng <wcheng@codeaurora.org> usb: gadget: f_fs: Ensure io_completion_wq is idle during unbind

During unbind, ffs_func_eps_disable() will be executed, resulting in
completion callbacks for any pending USB requests. When using AIO,
irrespective of the completion status, io_data work is queued to
io_completion_wq to evaluate and handle the completed requests. Since
work runs asynchronously to the unbind() routine, there can be a
scenario where the work runs after the USB gadget has been fully
removed, resulting in accessing of a resource which has been already
freed. (i.e. usb_ep_free_request() accessing the USB ep structure)

Explicitly drain the io_completion_wq, instead of relying on the
destroy_workqueue() (in ffs_data_put()) to make sure no pending
completion work items are running.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/1621644261-1236-1-git-send-email-wcheng@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff 8704fd73 Fri Nov 27 07:05:59 MST 2020 Greg Kroah-Hartman <gregkh@linuxfoundation.org> USB: gadget: f_fs: remove likely/unlikely

They are used way too often in this file, in some ways that are actually
wrong. Almost all of these are already known by the compiler and CPU so
just remove them all as none of these should be on any "hot paths" where
it actually matters.

Cc: Felipe Balbi <balbi@kernel.org>
Reported-by: Peter Chen <peter.chen@nxp.com>
Reviewed-by: Peter Chen <peter.chen@nxp.com>
Link: https://lore.kernel.org/r/20201127140559.381351-6-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff f5678e7f Wed Jun 10 19:42:06 MDT 2020 Christoph Hellwig <hch@lst.de> kernel: better document the use_mm/unuse_mm API contract

Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.au

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb]
Acked-by: Haren Myneni <haren@linux.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff 6cf439e0 Wed Jan 24 01:11:53 MST 2018 Jack Pham <jackp@codeaurora.org> usb: gadget: f_fs: Process all descriptors during bind

During _ffs_func_bind(), the received descriptors are evaluated
to prepare for binding with the gadget in order to allocate
endpoints and optionally set up OS descriptors. However, the
high- and super-speed descriptors are only parsed based on
whether the gadget_is_dualspeed() and gadget_is_superspeed()
calls are true, respectively.

This is a problem in case a userspace program always provides
all of the {full,high,super,OS} descriptors when configuring a
function. Then, for example if a gadget device is not capable
of SuperSpeed, the call to ffs_do_descs() for the SS descriptors
is skipped, resulting in an incorrect offset calculation for
the vla_ptr when moving on to the OS descriptors that follow.
This causes ffs_do_os_descs() to fail as it is now looking at
the SS descriptors' offset within the raw_descs buffer instead.

_ffs_func_bind() should evaluate the descriptors unconditionally,
so remove the checks for gadget speed.

Fixes: f0175ab51993 ("usb: gadget: f_fs: OS descriptors support")
Cc: stable@vger.kernel.org
Co-Developed-by: Mayank Rana <mrana@codeaurora.org>
Signed-off-by: Mayank Rana <mrana@codeaurora.org>
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
/linux-master/fs/ceph/
H A Dfile.cdiff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6a81749e Wed Nov 13 07:56:06 MST 2019 Jeff Layton <jlayton@kernel.org> ceph: increment/decrement dio counter on async requests

Ceph can in some cases issue an async DIO request, in which case we can
end up calling ceph_end_io_direct before the I/O is actually complete.
That may allow buffered operations to proceed while DIO requests are
still in flight.

Fix this by incrementing the i_dio_count when issuing an async DIO
request, and decrement it when tearing down the aio_req.

Fixes: 321fe13c9398 ("ceph: add buffered/direct exclusionary locking for reads and writes")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
diff 6fd4e634 Mon Sep 09 09:48:54 MDT 2019 Luis Henriques <lhenriques@suse.com> ceph: allow object copies across different filesystems in the same cluster

OSDs are able to perform object copies across different pools. Thus,
there's no need to prevent copy_file_range from doing remote copies if the
source and destination superblocks are different. Only return -EXDEV if
they have different fsid (the cluster ID).

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
diff 0a4c9265 Wed Jan 23 01:48:28 MST 2019 Gustavo A. R. Silva <gustavo@embeddedor.com> fs: mark expected switch fall-throughs

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
/linux-master/drivers/block/
H A Dloop.cdiff 6f8191fd Sun Jun 19 00:05:51 MDT 2022 Christoph Hellwig <hch@lst.de> block: simplify disk shutdown

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6f8191fd Sun Jun 19 00:05:51 MDT 2022 Christoph Hellwig <hch@lst.de> block: simplify disk shutdown

Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for
all disks that do not have separately allocated queues, and thus remove
the need to call blk_cleanup_queue for them.

Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that
this function is intended only for separately allocated blk-mq queues.

This saves an extra queue freeze for devices without a separately
allocated queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 3ce6e1f6 Tue Jul 06 08:40:34 MDT 2021 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> loop: reintroduce global lock for safe loop_validate_file() traversal

Commit 6cc8e7430801fa23 ("loop: scale loop device by introducing per
device lock") re-opened a race window for NULL pointer dereference at
loop_validate_file() where commit 310ca162d779efee ("block/loop: Use
global lock for ioctl() operation.") has closed.

Although we need to guarantee that other loop devices will not change
during traversal, we can't take remote "struct loop_device"->lo_mutex
inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore,
introduce a global lock dedicated for loop_validate_file() which is
conditionally taken before local "struct loop_device"->lo_mutex is taken.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 3ce6e1f6 Tue Jul 06 08:40:34 MDT 2021 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> loop: reintroduce global lock for safe loop_validate_file() traversal

Commit 6cc8e7430801fa23 ("loop: scale loop device by introducing per
device lock") re-opened a race window for NULL pointer dereference at
loop_validate_file() where commit 310ca162d779efee ("block/loop: Use
global lock for ioctl() operation.") has closed.

Although we need to guarantee that other loop devices will not change
during traversal, we can't take remote "struct loop_device"->lo_mutex
inside loop_validate_file() in order to avoid AB-BA deadlock. Therefore,
introduce a global lock dedicated for loop_validate_file() which is
conditionally taken before local "struct loop_device"->lo_mutex is taken.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cc8e7430801fa23 ("loop: scale loop device by introducing per device lock")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff f9d10764 Wed Jun 23 08:59:04 MDT 2021 Christoph Hellwig <hch@lst.de> loop: split loop_control_ioctl

Split loop_control_ioctl into a helper for each command. This keeps the
code nicely separated for the upcoming locking changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Link: https://lore.kernel.org/r/20210623145908.92973-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6a03cd98 Wed Jun 16 01:15:47 MDT 2021 Christoph Hellwig <hch@lst.de> loop: fix order of cleaning up the queue and freeing the tagset

We must release the queue before freeing the tagset.

Fixes: 1c99502fae35 ("loop: use blk_mq_alloc_disk and blk_cleanup_disk")
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6f24784f Sun Jan 31 17:23:55 MST 2021 Al Viro <viro@zeniv.linux.org.uk> whack-a-mole: don't open-code iminor/imajor

several instances creeped back into the tree...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff 6cc8e743 Tue Jan 26 07:46:30 MST 2021 Pavel Tatashin <pasha.tatashin@soleen.com> loop: scale loop device by introducing per device lock

Currently, loop device has only one global lock: loop_ctl_mutex.

This becomes hot in scenarios where many loop devices are used.

Scale it by introducing per-device lock: lo_mutex that protects
modifications of all fields in struct loop_device.

Keep loop_ctl_mutex to protect global data: loop_index_idr, loop_lookup,
loop_add.

The new lock ordering requirement is that loop_ctl_mutex must be taken
before lo_mutex.

Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
/linux-master/fs/fuse/
H A Dfile.cdiff 3f29f1c3 Sun Dec 03 00:42:33 MST 2023 Amir Goldstein <amir73il@gmail.com> fuse: disable FOPEN_PARALLEL_DIRECT_WRITES with FUSE_DIRECT_IO_ALLOW_MMAP

The new fuse init flag FUSE_DIRECT_IO_ALLOW_MMAP breaks assumptions made by
FOPEN_PARALLEL_DIRECT_WRITES and causes test generic/095 to hit
BUG_ON(fi->writectr < 0) assertions in fuse_set_nowrite():

generic/095 5s ...
kernel BUG at fs/fuse/dir.c:1756!
...
? fuse_set_nowrite+0x3d/0xdd
? do_raw_spin_unlock+0x88/0x8f
? _raw_spin_unlock+0x2d/0x43
? fuse_range_is_writeback+0x71/0x84
fuse_sync_writes+0xf/0x19
fuse_direct_io+0x167/0x5bd
fuse_direct_write_iter+0xf0/0x146

Auto disable FOPEN_PARALLEL_DIRECT_WRITES when server negotiated
FUSE_DIRECT_IO_ALLOW_MMAP.

Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff c55e0a55 Tue Sep 19 20:40:00 MDT 2023 Tyler Fanelli <tfanelli@redhat.com> fuse: Rename DIRECT_IO_RELAX to DIRECT_IO_ALLOW_MMAP

Although DIRECT_IO_RELAX's initial usage is to allow shared mmap, its
description indicates a purpose of reducing memory footprint. This
may imply that it could be further used to relax other DIRECT_IO
operations in the future.

Replace it with a flag DIRECT_IO_ALLOW_MMAP which does only one thing,
allow shared mmap of DIRECT_IO files while still bypassing the cache
on regular reads and writes.

[Miklos] Also Keep DIRECT_IO_RELAX definition for backward compatibility.

Signed-off-by: Tyler Fanelli <tfanelli@redhat.com>
Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Cc: <stable@vger.kernel.org> # v6.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 035ff33c Wed Apr 20 08:05:41 MDT 2022 Miklos Szeredi <mszeredi@redhat.com> fuse: write inode in fuse_release()

A race between write(2) and close(2) allows pages to be dirtied after
fuse_flush -> write_inode_now(). If these pages are not flushed from
fuse_release(), then there might not be a writable open file later. So any
remaining dirty pages must be written back before the file is released.

This is a partial revert of the blamed commit.

Reported-by: syzbot+6e1efbd8efaaa6860e91@syzkaller.appspotmail.com
Fixes: 36ea23374d1f ("fuse: write inode in fuse_vma_close() instead of fuse_release()")
Cc: <stable@vger.kernel.org> # v5.16
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 6b19b766 Thu Oct 21 09:22:35 MDT 2021 Jens Axboe <axboe@kernel.dk> fs: get rid of the res2 iocb->ki_complete argument

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
diff 59bda8ec Tue Aug 31 06:18:08 MDT 2021 Miklos Szeredi <mszeredi@redhat.com> fuse: flush extending writes

Callers of fuse_writeback_range() assume that the file is ready for
modification by the server in the supplied byte range after the call
returns.

If there's a write that extends the file beyond the end of the supplied
range, then the file needs to be extended to at least the end of the range,
but currently that's not done.

There are at least two cases where this can cause problems:

- copy_file_range() will return short count if the file is not extended
up to end of the source range.

- FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
hence the region may not be fully allocated.

Fix by flushing writes from the start of the range up to the end of the
file. This could be optimized if the writes are non-extending, etc, but
it's probably not worth the trouble.

Fixes: a2bc92362941 ("fuse: fix copy_file_range() in the writeback case")
Fixes: 6b1bdb56b17c ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)")
Cc: <stable@vger.kernel.org> # v5.2
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 6c88632b Tue May 25 01:40:47 MDT 2021 Wu Bo <wubo40@huawei.com> fuse: use DIV_ROUND_UP helper macro for calculations

Replace open coded divisor calculations with the DIV_ROUND_UP kernel macro
for better readability.

Signed-off-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 6b1bdb56 Wed May 12 10:18:48 MDT 2021 Richard W.M. Jones <rjones@redhat.com> fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)

The current fuse module filters out fallocate(FALLOC_FL_ZERO_RANGE)
returning -EOPNOTSUPP. libnbd's nbdfuse would like to translate
FALLOC_FL_ZERO_RANGE requests into the NBD command
NBD_CMD_WRITE_ZEROES which allows NBD servers that support it to do
zeroing efficiently.

This commit treats this flag exactly like FALLOC_FL_PUNCH_HOLE.

A way to test this, requiring fuse >= 3, nbdkit >= 1.8 and the latest
nbdfuse from https://gitlab.com/nbdkit/libnbd/-/tree/master/fuse is to
create a file containing some data and "mirror" it to a fuse file:

$ dd if=/dev/urandom of=disk.img bs=1M count=1
$ nbdkit file disk.img
$ touch mirror.img
$ nbdfuse mirror.img nbd://localhost &

(mirror.img -> nbdfuse -> NBD over loopback -> nbdkit -> disk.img)

You can then run commands such as:

$ fallocate -z -o 1024 -l 1024 mirror.img

and check that the content of the original file ("disk.img") stays
synchronized. To show NBD commands, export LIBNBD_DEBUG=1 before
running nbdfuse. To clean up:

$ fusermount3 -u mirror.img
$ killall nbdkit

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 6ae330ca Wed Aug 19 16:19:54 MDT 2020 Vivek Goyal <vgoyal@redhat.com> virtiofs: serialize truncate/punch_hole and dax fault path

Currently in fuse we don't seem have any lock which can serialize fault
path with truncate/punch_hole path. With dax support I need one for
following reasons.

1. Dax requirement

DAX fault code relies on inode size being stable for the duration of
fault and want to serialize with truncate/punch_hole and they explicitly
mention it.

static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
const struct iomap_ops *ops)
/*
* Check whether offset isn't beyond end of file now. Caller is
* supposed to hold locks serializing us with truncate / punch hole so
* this is a reliable test.
*/
max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

2. Make sure there are no users of pages being truncated/punch_hole

get_user_pages() might take references to page and then do some DMA
to said pages. Filesystem might truncate those pages without knowing
that a DMA is in progress or some I/O is in progress. So use
dax_layout_busy_page() to make sure there are no such references
and I/O is not in progress on said pages before moving ahead with
truncation.

3. Limitation of kvm page fault error reporting

If we are truncating file on host first and then removing mappings in
guest lateter (truncate page cache etc), then this could lead to a
problem with KVM. Say a mapping is in place in guest and truncation
happens on host. Now if guest accesses that mapping, then host will
take a fault and kvm will either exit to qemu or spin infinitely.

IOW, before we do truncation on host, we need to make sure that guest
inode does not have any mapping in that region or whole file.

4. virtiofs memory range reclaim

Soon I will introduce the notion of being able to reclaim dax memory
ranges from a fuse dax inode. There also I need to make sure that
no I/O or fault is going on in the reclaimed range and nobody is using
it so that range can be reclaimed without issues.

Currently if we take inode lock, that serializes read/write. But it does
not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
for this purpose. It can be used to serialize with faults.

As of now, I am adding taking this semaphore only in dax fault path and
not regular fault path because existing code does not have one. May
be existing code can benefit from it as well to take care of some
races, but that we can fix later if need be. For now, I am just focussing
only on DAX path which is new path.

Also added logic to take fuse_inode->i_mmap_sem in
truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
fuse dax fault are mutually exlusive and avoid all the above problems.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
diff 6ddf3af9 Tue Jul 14 06:45:41 MDT 2020 Miklos Szeredi <mszeredi@redhat.com> fuse: clean up condition for writepage sending

fuse_writepages_fill uses following construction:

if (wpa && ap->num_pages &&
(A || B || C)) {
action;
} else if (wpa && D) {
if (E) {
the same action;
}
}

- ap->num_pages check is always true and can be removed

- "if" and "else if" calls the same action and can be merged.

Move checking A, B, C, D, E conditions to a helper, add comments.

Original-patch-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
/linux-master/include/linux/
H A Dfs.hdiff ddd65e19 Sat Mar 23 10:11:19 MDT 2024 Christian Brauner <brauner@kernel.org> block: handle BLK_OPEN_RESTRICT_WRITES correctly

Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
default this option is set. When it is set the long-standing behavior
of being able to write to mounted block devices is enabled.

But in order to guard against unintended corruption by writing to the
block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
off. In that case it isn't possible to write to mounted block devices
anymore.

A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
which disallows concurrent BLK_OPEN_WRITE access. When we still had the
bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
the mode was passed around. Since we managed to get rid of the bdev
handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
on whether the file was opened writable and writes to that block device
are blocked. That logic doesn't work because we do allow
BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.

Fix the detection logic and use an FMODE_* bit. We could've also abused
O_EXCL as an indicator that BLK_OPEN_RESTRICT_WRITES has been requested.
For userspace open paths O_EXCL will never be retained but for internal
opens where we open files that are never installed into a file
descriptor table this is fine. But it would be a gamble that this
doesn't cause bugs. Note that BLK_OPEN_RESTRICT_WRITES is an internal
only flag that cannot directly be raised by userspace. It is implicitly
raised during mounting.

Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
unset.

Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner
Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff ae8c5117 Tue Feb 06 19:56:19 MST 2024 Kent Overstreet <kent.overstreet@linux.dev> fs: add FS_IOC_GETFSSYSFSPATH

Add a new ioctl for getting the sysfs name of a filesystem - the path
under /sys/fs.

This is going to let us standardize exporting data from sysfs across
filesystems, e.g. time stats.

The returned path will always be of the form "$FSTYP/$SYSFS_IDENTIFIER",
where the sysfs identifier may be a UUID (for bcachefs) or a device name
(xfs).

Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240207025624.1019754-6-kent.overstreet@linux.dev
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 434f8d82 Tue Oct 24 07:01:12 MDT 2023 Christian Brauner <brauner@kernel.org> fs: remove get_active_super()

This function is now unused so remove it. One less function that uses
the global superblock list.

Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-6-599c19f4faac@kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
diff 61d4fb0b Tue Oct 24 16:14:37 MDT 2023 Christian Brauner <brauner@kernel.org> file, i915: fix file reference for mmap_singleton()

Today we got a report at [1] for rcu stalls on the i915 testsuite in [2]
due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict,
get_file_rcu() goes into an infinite loop trying to carefully verify
that i915->gem.mmap_singleton hasn't changed - see the splat below.

So I stared at this code to figure out what it actually does. It seems
that the i915->gem.mmap_singleton pointer itself never had rcu semantics.

The i915->gem.mmap_singleton is replaced in
file->f_op->release::singleton_release():

static int singleton_release(struct inode *inode, struct file *file)
{
struct drm_i915_private *i915 = file->private_data;

cmpxchg(&i915->gem.mmap_singleton, file, NULL);
drm_dev_put(&i915->drm);

return 0;
}

The cmpxchg() is ordered against a concurrent update of
i915->gem.mmap_singleton from mmap_singleton(). IOW, when
mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:

While mmap_singleton() does

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

it allocates a new file via anon_inode_getfile() and does

smp_store_mb(i915->gem.mmap_singleton, file);

So, then what happens in the case of this bug is that at some point
fput() is called and drops the file->f_count to zero leaving the pointer
in i915->gem.mmap_singleton in tact.

Now, there might be delays until
file->f_op->release::singleton_release() is called and
i915->gem.mmap_singleton is set to NULL.

Say concurrently another task hits mmap_singleton() and does:

rcu_read_lock();
file = get_file_rcu(&i915->gem.mmap_singleton);
rcu_read_unlock();

When get_file_rcu() fails to get a reference via atomic_inc_not_zero()
it will try the reload from i915->gem.mmap_singleton expecting it to be
NULL, assuming it has comparable semantics as we expect in
__fget_files_rcu().

But it hasn't so it reloads the same pointer again, trying the same
atomic_inc_not_zero() again and doing so until
file->f_op->release::singleton_release() of the old file has been
called.

So, in contrast to __fget_files_rcu() here we want to not retry when
atomic_inc_not_zero() has failed. We only want to retry in case we
managed to get a reference but the pointer did change on reload.

<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
<3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238
<3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20)
<6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002
<6> [511.395962] Call Trace:
<6> [511.395966] <TASK>
<6> [511.395974] ? __schedule+0x3a8/0xd70
<6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140
<6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
<6> [511.396029] ? get_file_rcu+0x10/0x30
<6> [511.396039] ? get_file_rcu+0x10/0x30
<6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915]
<6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915]
<6> [511.396903] ? mmap_region+0x253/0xb60
<6> [511.396925] ? do_mmap+0x334/0x5c0
<6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0
<6> [511.396949] ? rcu_is_watching+0x11/0x50
<6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915]
<6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915]
<6> [511.397762] ? igt_mmap+0x11e/0x150 [i915]
<6> [511.398139] ? __trace_bprintk+0x76/0x90
<6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915]
<6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915]
<6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915]
<6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915]
<6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915]
<6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915]
<6> [511.400692] ? pci_device_probe+0x95/0x120
<6> [511.400704] ? really_probe+0x164/0x3c0
<6> [511.400715] ? __pfx___driver_attach+0x10/0x10
<6> [511.400722] ? __driver_probe_device+0x73/0x160
<6> [511.400731] ? driver_probe_device+0x19/0xa0
<6> [511.400741] ? __driver_attach+0xb6/0x180
<6> [511.400749] ? __pfx___driver_attach+0x10/0x10
<6> [511.400756] ? bus_for_each_dev+0x77/0xd0
<6> [511.400770] ? bus_add_driver+0x114/0x210
<6> [511.400781] ? driver_register+0x5b/0x110
<6> [511.400791] ? i915_init+0x23/0xc0 [i915]
<6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915]
<6> [511.401503] ? do_one_initcall+0x57/0x270
<6> [511.401515] ? rcu_is_watching+0x11/0x50
<6> [511.401521] ? kmalloc_trace+0xa3/0xb0
<6> [511.401532] ? do_init_module+0x5f/0x210
<6> [511.401544] ? load_module+0x1d00/0x1f60
<6> [511.401581] ? init_module_from_file+0x86/0xd0
<6> [511.401590] ? init_module_from_file+0x86/0xd0
<6> [511.401613] ? idempotent_init_module+0x17c/0x230
<6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0
<6> [511.401650] ? do_syscall_64+0x3c/0x90
<6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
<6> [511.401684] </TASK>

Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com
Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963
Cc: Jann Horn <jannh@google.com>,
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>

Completed in 1770 milliseconds