History log of /linux-master/drivers/dax/super.c
Revision Date Author Comments
# d9212b35 24-Feb-2024 Chengming Zhou <zhouchengming@bytedance.com>

dax: remove SLAB_MEM_SPREAD flag usage

The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was
removed as of v6.8-rc1, so it became a dead flag since the commit
16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the
series[1] went on to mark it obsolete explicitly to avoid confusion
for users. Here we can just remove all its users, which has no any
functional change.

[1] https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/

Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20240224134728.829289-1-chengming.zhou@linux.dev
Signed-off-by: Dave Jiang <dave.jiang@intel.com>


# 902ccb86 15-Feb-2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

dax: fix incorrect list of data cache aliasing architectures

commit d92576f1167c ("dax: does not work correctly with virtual aliasing caches")
prevents DAX from building on architectures with virtually aliased
dcache with:

depends on !(ARM || MIPS || SPARC)

This check is too broad (e.g. recent ARMv7 don't have virtually aliased
dcaches), and also misses many other architectures with virtually
aliased data cache.

This is a regression introduced in the v4.0 Linux kernel where the
dax mount option is removed for 32-bit ARMv7 boards which have no data
cache aliasing, and therefore should work fine with FS_DAX.

This was turned into the following check in alloc_dax() by a preparatory
change:

if (ops && (IS_ENABLED(CONFIG_ARM) ||
IS_ENABLED(CONFIG_MIPS) ||
IS_ENABLED(CONFIG_SPARC)))
return NULL;

Use cpu_dcache_is_aliasing() instead to figure out whether the environment
has aliasing data caches.

Link: https://lkml.kernel.org/r/20240215144633.96437-10-mathieu.desnoyers@efficios.com
Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing caches")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 1df4ca01 15-Feb-2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

dax: check for data cache aliasing at runtime

Replace the following fs/Kconfig:FS_DAX dependency:

depends on !(ARM || MIPS || SPARC)

By a runtime check within alloc_dax(). This runtime check returns
ERR_PTR(-EOPNOTSUPP) if the @ops parameter is non-NULL (which means
the kernel is using an aliased mapping) on an architecture which
has data cache aliasing.

Change the return value from NULL to PTR_ERR(-EOPNOTSUPP) for
CONFIG_DAX=n for consistency.

This is done in preparation for using cpu_dcache_is_aliasing() in a
following change which will properly support architectures which detect
data cache aliasing at runtime.

Link: https://lkml.kernel.org/r/20240215144633.96437-8-mathieu.desnoyers@efficios.com
Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing caches")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 6d439c18 15-Feb-2024 Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

dax: alloc_dax() return ERR_PTR(-EOPNOTSUPP) for CONFIG_DAX=n

Change the return value from NULL to PTR_ERR(-EOPNOTSUPP) for
CONFIG_DAX=n to be consistent with the fact that CONFIG_DAX=y
never returns NULL.

This is done in preparation for using cpu_dcache_is_aliasing() in a
following change which will properly support architectures which detect
data cache aliasing at runtime.

Link: https://lkml.kernel.org/r/20240215144633.96437-3-mathieu.desnoyers@efficios.com
Fixes: 4e4ced93794a ("dax: Move mandatory ->zero_page_range() check in alloc_dax()")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# f88c3fb8 12-Mar-2024 Linus Torvalds <torvalds@linux-foundation.org>

mm, slab: remove last vestiges of SLAB_MEM_SPREAD

Yes, yes, I know the slab people were planning on going slow and letting
every subsystem fight this thing on their own. But let's just rip off
the band-aid and get it over and done with. I don't want to see a
number of unnecessary pull requests just to get rid of a flag that no
longer has any meaning.

This was mainly done with a couple of 'sed' scripts and then some manual
cleanup of the end result.

Link: https://lore.kernel.org/all/CAHk-=wji0u+OOtmAOD-5JV3SXcRJF___k_+8XNKmak0yd5vW1Q@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fa422b35 23-Oct-2023 Shiyang Ruan <ruansy.fnst@fujitsu.com>

mm, pmem, xfs: Introduce MF_MEM_PRE_REMOVE for unbind

Now, if we suddenly remove a PMEM device(by calling unbind) which
contains FSDAX while programs are still accessing data in this device,
e.g.:
```
$FSSTRESS_PROG -d $SCRATCH_MNT -n 99999 -p 4 &
# $FSX_PROG -N 1000000 -o 8192 -l 500000 $SCRATCH_MNT/t001 &
echo "pfn1.1" > /sys/bus/nd/drivers/nd_pmem/unbind
```
it could come into an unacceptable state:
1. device has gone but mount point still exists, and umount will fail
with "target is busy"
2. programs will hang and cannot be killed
3. may crash with NULL pointer dereference

To fix this, we introduce a MF_MEM_PRE_REMOVE flag to let it know that we
are going to remove the whole device, and make sure all related processes
could be notified so that they could end up gracefully.

This patch is inspired by Dan's "mm, dax, pmem: Introduce
dev_pagemap_failure()"[1]. With the help of dax_holder and
->notify_failure() mechanism, the pmem driver is able to ask filesystem
on it to unmap all files in use, and notify processes who are using
those files.

Call trace:
trigger unbind
-> unbind_store()
-> ... (skip)
-> devres_release_all()
-> kill_dax()
-> dax_holder_notify_failure(dax_dev, 0, U64_MAX, MF_MEM_PRE_REMOVE)
-> xfs_dax_notify_failure()
`-> freeze_super() // freeze (kernel call)
`-> do xfs rmap
` -> mf_dax_kill_procs()
` -> collect_procs_fsdax() // all associated processes
` -> unmap_and_kill()
` -> invalidate_inode_pages2_range() // drop file's cache
`-> thaw_super() // thaw (both kernel & user call)

Introduce MF_MEM_PRE_REMOVE to let filesystem know this is a remove
event. Use the exclusive freeze/thaw[2] to lock the filesystem to prevent
new dax mapping from being created. Do not shutdown filesystem directly
if configuration is not supported, or if failure range includes metadata
area. Make sure all files and processes(not only the current progress)
are handled correctly. Also drop the cache of associated files before
pmem is removed.

[1]: https://lore.kernel.org/linux-mm/161604050314.1463742.14151665140035795571.stgit@dwillia2-desk3.amr.corp.intel.com/
[2]: https://lore.kernel.org/linux-xfs/169116275623.3187159.16862410128731457358.stg-ugh@frogsfrogsfrogs/

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>


# 1ea7ca1b 15-Jun-2023 Jane Chu <jane.chu@oracle.com>

dax: enable dax fault handler to report VM_FAULT_HWPOISON

When multiple processes mmap() a dax file, then at some point,
a process issues a 'load' and consumes a hwpoison, the process
receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb
set for the poison scope. Soon after, any other process issues
a 'load' to the poisoned page (that is unmapped from the kernel
side by memory_failure), it receives a SIGBUS with
si_code = BUS_ADRERR and without valid si_lsb.

This is confusing to user, and is different from page fault due
to poison in RAM memory, also some helpful information is lost.

Channel dax backend driver's poison detection to the filesystem
such that instead of reporting VM_FAULT_SIGBUS, it could report
VM_FAULT_HWPOISON.

If user level block IO syscalls fail due to poison, the errno will
be converted to EIO to maintain block API consistency.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>


# 1c88b9ba 17-Jan-2023 Randy Dunlap <rdunlap@infradead.org>

dax: super.c: fix kernel-doc bad line warning

Convert an empty line to " *" to avoid a kernel-doc warning:

drivers/dax/super.c:478: warning: bad line:

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: nvdimm@lists.linux.dev
Link: https://lore.kernel.org/r/20230117070249.31934-1-rdunlap@infradead.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 0f702033 25-Sep-2022 Bo Liu <liubo03@inspur.com>

dax: Remove usage of the deprecated ida_simple_xxx API

ida_alloc_max() makes it clear that the second argument is inclusive,
and the alloc/free terminology is more idiomatic and symmetric then
get/remove.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220926012635.3205-1-liubo03@inspur.com
[djbw: reword changelog]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 8012b866 02-Jun-2022 Shiyang Ruan <ruansy.fnst@fujitsu.com>

dax: introduce holder for dax_device

Patch series "v14 fsdax-rmap + v11 fsdax-reflink", v2.

The patchset fsdax-rmap is aimed to support shared pages tracking for
fsdax.

It moves owner tracking from dax_assocaite_entry() to pmem device driver,
by introducing an interface ->memory_failure() for struct pagemap. This
interface is called by memory_failure() in mm, and implemented by pmem
device.

Then call holder operations to find the filesystem which the corrupted
data located in, and call filesystem handler to track files or metadata
associated with this page.

Finally we are able to try to fix the corrupted data in filesystem and do
other necessary processing, such as killing processes who are using the
files affected.

The call trace is like this:
memory_failure()
|* fsdax case
|------------
|pgmap->ops->memory_failure() => pmem_pgmap_memory_failure()
| dax_holder_notify_failure() =>
| dax_device->holder_ops->notify_failure() =>
| - xfs_dax_notify_failure()
| |* xfs_dax_notify_failure()
| |--------------------------
| | xfs_rmap_query_range()
| | xfs_dax_failure_fn()
| | * corrupted on metadata
| | try to recover data, call xfs_force_shutdown()
| | * corrupted on file data
| | try to recover data, call mf_dax_kill_procs()
|* normal case
|-------------
|mf_generic_kill_procs()


The patchset fsdax-reflink attempts to add CoW support for fsdax, and
takes XFS, which has both reflink and fsdax features, as an example.

One of the key mechanisms needed to be implemented in fsdax is CoW. Copy
the data from srcmap before we actually write data to the destination
iomap. And we just copy range in which data won't be changed.

Another mechanism is range comparison. In page cache case, readpage() is
used to load data on disk to page cache in order to be able to compare
data. In fsdax case, readpage() does not work. So, we need another
compare data with direct access support.

With the two mechanisms implemented in fsdax, we are able to make reflink
and fsdax work together in XFS.


This patch (of 14):

To easily track filesystem from a pmem device, we introduce a holder for
dax_device structure, and also its operation. This holder is used to
remember who is using this dax_device:

- When it is the backend of a filesystem, the holder will be the
instance of this filesystem.
- When this pmem device is one of the targets in a mapped device, the
holder will be this mapped device. In this case, the mapped device
has its own dax_device and it will follow the first rule. So that we
can finally track to the filesystem we needed.

The holder and holder_ops will be set when filesystem is being mounted,
or an target device is being activated.

Link: https://lkml.kernel.org/r/20220603053738.1218681-1-ruansy.fnst@fujitsu.com
Link: https://lkml.kernel.org/r/20220603053738.1218681-2-ruansy.fnst@fujitsu.com
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.wiliams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>


# 047218ec 22-Apr-2022 Jane Chu <jane.chu@oracle.com>

dax: add .recovery_write dax_operation

Introduce dax_recovery_write() operation. The function is used to
recover a dax range that contains poison. Typical use case is when
a user process receives a SIGBUS with si_code BUS_MCEERR_AR
indicating poison(s) in a dax range, in response, the user process
issues a pwrite() to the page-aligned dax range, thus clears the
poison and puts valid data in the range.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e511c4a3 13-May-2022 Jane Chu <jane.chu@oracle.com>

dax: introduce DAX_RECOVERY_WRITE dax access mode

Up till now, dax_direct_access() is used implicitly for normal
access, but for the purpose of recovery write, dax range with
poison is requested. To make the interface clear, introduce
enum dax_access_mode {
DAX_ACCESS,
DAX_RECOVERY_WRITE,
}
where DAX_ACCESS is used for normal dax access, and
DAX_RECOVERY_WRITE is used for dax recovery write.

Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# db8cd5ef 04-Mar-2022 Ira Weiny <ira.weiny@intel.com>

dax: Fix missing kdoc for dax_device

struct dax_device has a member named ops which was undocumented.

Add the kdoc.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220304204655.3489216-1-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# a7e8de82 12-Feb-2022 Tong Zhang <ztong0001@gmail.com>

dax: make sure inodes are flushed before destroy cache

A bug can be triggered by following command

$ modprobe nd_pmem && modprobe -r nd_pmem

[ 10.060014] BUG dax_cache (Not tainted): Objects remaining in dax_cache on __kmem_cache_shutdown()
[ 10.060938] Slab 0x0000000085b729ac objects=9 used=1 fp=0x000000004f5ae469 flags=0x200000000010200(slab|head|node)
[ 10.062433] Call Trace:
[ 10.062673] dump_stack_lvl+0x34/0x44
[ 10.062865] slab_err+0x90/0xd0
[ 10.063619] __kmem_cache_shutdown+0x13b/0x2f0
[ 10.063848] kmem_cache_destroy+0x4a/0x110
[ 10.064058] __x64_sys_delete_module+0x265/0x300

This is caused by dax_fs_exit() not flushing inodes before destroy cache.
To fix this issue, call rcu_barrier() before destroy cache.

Signed-off-by: Tong Zhang <ztong0001@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220212071111.148575-1-ztong0001@gmail.com
Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# fd60b288 22-Mar-2022 Muchun Song <songmuchun@bytedance.com>

fs: allocate inode by using alloc_inode_sb()

The inode allocation is supposed to use alloc_inode_sb(), so convert
kmem_cache_alloc() of all filesystems to alloc_inode_sb().

Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4]
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7ac5360c 15-Dec-2021 Christoph Hellwig <hch@lst.de>

dax: remove the copy_from_iter and copy_to_iter methods

These methods indirect the actual DAX read/write path. In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.

Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
routines are used to remove indirect call from the read/write fast path
as well as a lot of boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 30c6828a 15-Dec-2021 Christoph Hellwig <hch@lst.de>

dax: remove the DAXDEV_F_SYNC flag

Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and
just let the drivers call set_dax_synchronous directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# fd1d00ec 15-Dec-2021 Christoph Hellwig <hch@lst.de>

dax: simplify dax_synchronous and set_dax_synchronous

Remove the pointless wrappers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20211215084508.435401-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# cd913c76 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dax: return the partition offset from fs_dax_get_by_bdev

Prepare for the removal of the block_device from the DAX I/O path by
returning the partition offset from fs_dax_get_by_bdev so that the file
systems have it at hand for use during I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 60696eb2 29-Nov-2021 Christoph Hellwig <hch@lst.de>

fsdax: simplify the pgoff calculation

Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a
single dax_iomap_pgoff helper that avoids lots of cumbersome sector
conversions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-15-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 7b0800d0 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dax: remove dax_capable

Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs]
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 0c445871 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dax: move the partition alignment check into fs_dax_get_by_bdev

fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-7-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 586f6168 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dax: remove the pgmap sanity checks in generic_fsdax_supported

Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-6-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# fb08a190 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dax: simplify the dax_device <-> gendisk association

Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 5d2a228b 29-Nov-2021 Christoph Hellwig <hch@lst.de>

dm: make the DAX support depend on CONFIG_FS_DAX

The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax. Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e765f13e 22-Sep-2021 Christoph Hellwig <hch@lst.de>

nvdimm/pmem: move dax_attribute_group from dax to pmem

dax_attribute_group is only used by the pmem driver, and can avoid the
completely pointless lookup by the disk name if moved there. This
leaves just a single caller of dax_get_by_host, so move dax_get_by_host
into the same ifdef block as that caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210922173431.2454024-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# bdd3c50d 26-Aug-2021 Christoph Hellwig <hch@lst.de>

dax: remove bdev_dax_supported

All callers already have a dax_device obtained from fs_dax_get_by_bdev
at hand, so just pass that to dax_supported() insted of doing another
lookup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210826135510.6293-10-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 60b8340f 26-Aug-2021 Christoph Hellwig <hch@lst.de>

dax: stub out dax_supported for !CONFIG_FS_DAX

dax_supported calls into ->dax_supported which checks for fsdax support.
Don't bother building it for !CONFIG_FS_DAX as it will always return
false.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210826135510.6293-8-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# cd93a2a4 26-Aug-2021 Christoph Hellwig <hch@lst.de>

dax: remove __generic_fsdax_supported

Just implement generic_fsdax_supported directly out of line instead of
adding a wrapper. Given that generic_fsdax_supported is only supplied
for CONFIG_FS_DAX builds this also allows to not provide it at all for
!CONFIG_FS_DAX builds.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210826135510.6293-7-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 673a0658 26-Aug-2021 Christoph Hellwig <hch@lst.de>

dax: move the dax_read_lock() locking into dax_supported

Move the dax_read_lock/dax_read_unlock pair from the callers into
dax_supported to make it a little easier to use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210826135510.6293-6-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 1b764601 26-Aug-2021 Christoph Hellwig <hch@lst.de>

dax: mark dax_get_by_host static

And move the code around a bit to avoid a forward declaration.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210826135510.6293-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 39b6389a 26-Aug-2021 Christoph Hellwig <hch@lst.de>

dax: stop using bdevname

Just use the %pg format specifier instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20210826135510.6293-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# b05d4c57 25-May-2021 Ira Weiny <ira.weiny@intel.com>

dax: Ensure errno is returned from dax_direct_access

If the caller specifies a negative nr_pages that is an invalid
parameter.

Return -EINVAL to ensure callers get an errno if they want to check it.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20210525172428.3634316-4-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 6f24784f 31-Jan-2021 Al Viro <viro@zeniv.linux.org.uk>

whack-a-mole: don't open-code iminor/imajor

several instances creeped back into the tree...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1aa57431 01-Dec-2020 Wang Hai <wanghai38@huawei.com>

device-dax/core: Fix memory leak when rmmod dax.ko

When I repeatedly modprobe and rmmod dax.ko, kmemleak report a
memory leak as follows:

unreferenced object 0xffff9a5588c05088 (size 8):
comm "modprobe", pid 261, jiffies 4294693644 (age 42.063s)
...
backtrace:
[<00000000e007ced0>] kstrdup+0x35/0x70
[<000000002ae73897>] kstrdup_const+0x3d/0x50
[<000000002b00c9c3>] kvasprintf_const+0xbc/0xf0
[<000000008023282f>] kobject_set_name_vargs+0x3b/0xd0
[<00000000d2cbaa4e>] kobject_set_name+0x62/0x90
[<00000000202e7a22>] bus_register+0x7f/0x2b0
[<000000000b77792c>] 0xffffffffc02840f7
[<000000002d5be5ac>] 0xffffffffc02840b4
[<00000000dcafb7cd>] do_one_initcall+0x58/0x240
[<00000000049fe480>] do_init_module+0x56/0x1e2
[<0000000022671491>] load_module+0x2517/0x2840
[<000000001a2201cb>] __do_sys_finit_module+0x9c/0xe0
[<000000003eb304e7>] do_syscall_64+0x33/0x40
[<0000000051c5fd06>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

When rmmod dax is executed, dax_bus_exit() is missing. This patch
can fix this bug.

Fixes: 9567da0b408a ("device-dax: Introduce bus + driver model")
Cc: <stable@vger.kernel.org>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Link: https://lore.kernel.org/r/20201201135929.66530-1-wanghai38@huawei.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# d4c5da50 17-Sep-2020 Adrian Huang <ahuang12@lenovo.com>

dax: Fix stack overflow when mounting fsdax pmem device

When mounting fsdax pmem device, commit 6180bb446ab6 ("dax: fix
detection of dax support for non-persistent memory block devices")
introduces the stack overflow [1][2]. Here is the call path for
mounting ext4 file system:
ext4_fill_super
bdev_dax_supported
__bdev_dax_supported
dax_supported
generic_fsdax_supported
__generic_fsdax_supported
bdev_dax_supported

The call path leads to the infinite calling loop, so we cannot
call bdev_dax_supported() in __generic_fsdax_supported(). The sanity
checking of the variable 'dax_dev' is moved prior to the two
bdev_dax_pgoff() checks [3][4].

[1] https://lore.kernel.org/linux-nvdimm/1420999447.1004543.1600055488770.JavaMail.zimbra@redhat.com/
[2] https://lore.kernel.org/linux-nvdimm/alpine.LRH.2.02.2009141131220.30651@file01.intranet.prod.int.rdu2.redhat.com/
[3] https://lore.kernel.org/linux-nvdimm/CA+RJvhxBHriCuJhm-D8NvJRe3h2MLM+ZMFgjeJjrRPerMRLvdg@mail.gmail.com/
[4] https://lore.kernel.org/linux-nvdimm/20200903160608.GU878166@iweiny-DESK2.sc.intel.com/

Fixes: 6180bb446ab6 ("dax: fix detection of dax support for non-persistent memory block devices")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: Coly Li <colyli@suse.de>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: John Pittman <jpittman@redhat.com>
Link: https://lore.kernel.org/r/20200917111549.6367-1-adrianhuang0701@gmail.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e2ec5128 20-Sep-2020 Jan Kara <jack@suse.cz>

dm: Call proper helper to determine dax support

DM was calling generic_fsdax_supported() to determine whether a device
referenced in the DM table supports DAX. However this is a helper for "leaf" device drivers so that
they don't have to duplicate common generic checks. High level code
should call dax_supported() helper which that calls into appropriate
helper for the particular device. This problem manifested itself as
kernel messages:

dm-3: error: dax access failed (-95)

when lvm2-testsuite run in cases where a DM device was stacked on top of
another DM device.

Fixes: 7bf7eac8d648 ("dax: Arrange for dax_supported check to span multiple devices")
Cc: <stable@vger.kernel.org>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/160061715195.13131.5503173247632041975.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 1a9d5d40 19-Aug-2020 Vivek Goyal <vgoyal@redhat.com>

dax: Modify bdev_dax_pgoff() to handle NULL bdev

virtiofs does not have a block device but it has dax device.
Modify bdev_dax_pgoff() to be able to handle that.

If there is no bdev, that means dax offset is 0. (It can't be a partition
block device starting at an offset in dax device).

This is little hackish. There have been discussions about getting rid
of dax not supporting partitions.

https://lore.kernel.org/linux-fsdevel/20200107125159.GA15745@infradead.org/

IMHO, this path can easily break exisitng users. For example
ioctl(BLKPG_ADD_PARTITION) will start breaking on block devices
supporting DAX. Also, I personally find it very useful to be able to
partition dax devices and still be able to use DAX.

Alternatively, I tried to store offset into dax device information in iomap
interface, but that got NACKed.

https://lore.kernel.org/linux-fsdevel/20200217133117.GB20444@infradead.org/

I can't think of a good path to solve this issue properly. So to make
progress, it seems this patch is least bad option for now and I hope
we can take it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: "Weiny, Ira" <ira.weiny@intel.com>
Cc: linux-nvdimm@lists.01.org
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>


# 6180bb44 03-Sep-2020 Coly Li <colyli@suse.de>

dax: fix detection of dax support for non-persistent memory block devices

When calling __generic_fsdax_supported(), a dax-unsupported device may
not have dax_dev as NULL, e.g. the dax related code block is not enabled
by Kconfig.

Therefore in __generic_fsdax_supported(), to check whether a device
supports DAX or not, the following order of operations should be
performed:
- If dax_dev pointer is NULL, it means the device driver explicitly
announce it doesn't support DAX. Then it is OK to directly return
false from __generic_fsdax_supported().
- If dax_dev pointer is NOT NULL, it might be because the driver doesn't
support DAX and not explicitly initialize related data structure. Then
bdev_dax_supported() should be called for further check.

If device driver desn't explicitly set its dax_dev pointer to NULL,
this is not a bug. Calling bdev_dax_supported() makes sure they can be
recognized as dax-unsupported eventually.

Fixes: c2affe920b0e ("dax: do not print error message for non-persistent memory block device")
Cc: Jan Kara <jack@suse.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-and-tested-by: Adrian Huang <ahuang12@lenovo.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/20200903161625.19524-1-colyli@suse.de


# c2affe92 19-Aug-2020 Adrian Huang <ahuang12@lenovo.com>

dax: do not print error message for non-persistent memory block device

Commit 231609785cbf ("dax: print error message by pr_info()
in __generic_fsdax_supported()") happens to print the following
error message during booting when the non-persistent memory block
devices are configured by device mapper. Those error messages are
caused by the variable 'dax_dev' is NULL. Users might be confused
with those error messages since they do not use the persistent
memory device. Moreover, users might scare about "what's wrong
with my disks" because they see the 'error' and 'failed' keywords.

# dmesg | grep fail
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdk3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)
sdb3: error: dax access failed (-95)

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 156M 0 part
├─sda2 8:2 0 40G 0 part
└─sda3 8:3 0 1.1T 0 part
sdb 8:16 0 1.1T 0 disk
├─sdb1 8:17 0 600M 0 part
├─sdb2 8:18 0 1G 0 part
└─sdb3 8:19 0 1.1T 0 part
├─rhel00-swap 254:3 0 4G 0 lvm
├─rhel00-home 254:4 0 1T 0 lvm
└─rhel00-root 254:5 0 50G 0 lvm
sdc 8:32 0 1.1T 0 disk
sdd 8:48 0 1.1T 0 disk
sde 8:64 0 1.1T 0 disk
sdf 8:80 0 1.1T 0 disk
sdg 8:96 0 1.1T 0 disk
sdh 8:112 0 3.3T 0 disk
├─sdh1 8:113 0 500M 0 part /boot/efi
├─sdh2 8:114 0 40G 0 part /
├─sdh3 8:115 0 2.9T 0 part /home
└─sdh4 8:116 0 314.6G 0 part [SWAP]
sdi 8:128 0 1.1T 0 disk
sdj 8:144 0 3.3T 0 disk
├─sdj1 8:145 0 512M 0 part
└─sdj2 8:146 0 3.3T 0 part
sdk 8:160 0 119.2G 0 disk
├─sdk1 8:161 0 200M 0 part
├─sdk2 8:162 0 1G 0 part
└─sdk3 8:163 0 118G 0 part
├─rhel-swap 254:0 0 4G 0 lvm
├─rhel-home 254:1 0 64G 0 lvm
└─rhel-root 254:2 0 50G 0 lvm
sdl 8:176 0 119.2G 0 disk

The call path is shown as follows:
dm_table_determine_type
dm_table_supports_dax
device_supports_dax
generic_fsdax_supported
__generic_fsdax_supported

With the disk configuration listing from the command 'lsblk',
the member 'dev->dax_dev' of the block devices 'sdb3' and 'sdk3'
(configured by device mapper) is NULL in function
generic_fsdax_supported() because the member is configured in
function open_table_device().

To prevent the confusing error messages in this scenario (this is
normal behavior), just print those error messages by pr_debug()
by checking if dax_dev is NULL and the block device does not support
DAX.

Link: https://lore.kernel.org/r/20200819154236.24191-1-adrianhuang0701@gmail.com
Fixes: 231609785cbf ("dax: print error message by pr_info() in __generic_fsdax_supported()")
Cc: Coly Li <colyli@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>


# eedfd73d 17-Jul-2020 Ira Weiny <ira.weiny@intel.com>

drivers/dax: Expand lock scope to cover the use of addresses

The addition of PKS protection to dax read lock/unlock will require that
the address returned by dax_direct_access() be protected by this lock.

Correct the locking by ensuring that the use of kaddr and end_kaddr
are covered by the dax read lock/unlock.

Link: https://lore.kernel.org/r/20200717072056.73134-12-ira.weiny@intel.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>


# 23160978 25-Jul-2020 Coly Li <colyli@suse.de>

dax: print error message by pr_info() in __generic_fsdax_supported()

In struct dax_operations, the callback routine dax_supported() returns
a bool type result. For false return value, the caller has no idea
whether the device does not support dax at all, or it is just some mis-
configuration issue.

An example is formatting an Ext4 file system on pmem device on top of
a NVDIMM namespace by,
# mkfs.ext4 /dev/pmem0
If the fs block size does not match kernel space memory page size (which
is possible on non-x86 platform), mount this Ext4 file system will fail,
# mount -o dax /dev/pmem0 /mnt
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/pmem0,
missing codepage or helper program, or other error.
And from the dmesg output there is only the following information,
[ 307.853148] EXT4-fs (pmem0): DAX unsupported by block device.

The above information is quite confusing. Because definitely the pmem0
device supports dax operation, and the super block is consistent as how
it was created by mkfs.ext4.

Indeed the failure is from __generic_fsdax_supported() by the following
code piece,
if (blocksize != PAGE_SIZE) {
pr_debug("%s: error: unsupported blocksize for dax\n",
bdevname(bdev, buf));
return false;
}
It is because the Ext4 block size is 4KB and kernel page size is 8KB or
16KB.

It is not simple to make dax_supported() from struct dax_operations
or __generic_fsdax_supported() to return exact failure type right now.
So the simplest fix is to use pr_info() to print all the error messages
inside __generic_fsdax_supported(). Then users may find informative clue
from the kernel message at least.

Message printed by pr_debug() is very easy to be ignored by users. This
patch prints error message by pr_info() in __generic_fsdax_supported(),
when then mount fails, following lines can be found from dmesg output,
[ 2705.500885] pmem0: error: unsupported blocksize for dax
[ 2705.500888] EXT4-fs (pmem0): DAX unsupported by block device.
Now the users may have idea the mount failure is from pmem driver for
unsupported block size.

Link: https://lore.kernel.org/r/20200725162450.95999-1-colyli@suse.de
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anthony Iliopoulos <ailiopoulos@suse.com>
Reported-by: Michal Suchanek <msuchanek@suse.com>
Suggested-by: Jan Kara <jack@suse.com>
Reviewed-by: Jan Kara <jack@suse.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>


# e556f6ba 26-Jun-2020 Christoph Hellwig <hch@lst.de>

block: remove the bd_queue field from struct block_device

Just use bd_disk->queue instead.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 4e4ced93 31-Mar-2020 Vivek Goyal <vgoyal@redhat.com>

dax: Move mandatory ->zero_page_range() check in alloc_dax()

zero_page_range() dax operation is mandatory for dax devices. Right now
that check happens in dax_zero_page_range() function. Dan thinks that's
too late and its better to do the check earlier in alloc_dax().

I also modified alloc_dax() to return pointer with error code in it in
case of failure. Right now it returns NULL and caller assumes failure
happened due to -ENOMEM. But with this ->zero_page_range() check, I
need to return -EINVAL instead.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# f605a263 28-Feb-2020 Vivek Goyal <vgoyal@redhat.com>

dax, pmem: Add a dax operation zero_page_range

Add a dax operation zero_page_range, to zero a page. This will also clear any
known poison in the page being zeroed.

As of now, zeroing of one page is allowed in a single call. There
are no callers which are trying to zero more than a page in a single call.
Once we grow the callers which zero more than a page in single call, we
can add that support. Primary reason for not doing that yet is that this
will add little complexity in dm implementation where a range might be
spanning multiple underlying targets and one will have to split the range
into multiple sub ranges and call zero_page_range() on individual targets.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Link: https://lore.kernel.org/r/20200228163456.1587-3-vgoyal@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# f01b16a8 06-Jan-2020 Vivek Goyal <vgoyal@redhat.com>

dax: Get rid of fs_dax_get_by_host() helper

Looks like nobody is using fs_dax_get_by_host() except fs_dax_get_by_bdev()
and it can easily use dax_get_by_host() instead.

IIUC, fs_dax_get_by_host() was only introduced so that one could compile
with CONFIG_FS_DAX=n and CONFIG_DAX=m. fs_dax_get_by_bdev() achieves
the same purpose and hence it looks like fs_dax_get_by_host() is not
needed anymore.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20200106181117.GA16248@redhat.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# fefc1d97 05-Jul-2019 Pankaj Gupta <pagupta@redhat.com>

libnvdimm: add dax_dev sync flag

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 5b497af4 29-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of version 2 of the gnu general public license as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 64 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.894819585@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 75d4e06f 25-Mar-2019 David Howells <dhowells@redhat.com>

vfs: Convert dax to use the new mount API

Convert the dax filesystem to the new internal mount API as the old
one will be obsoleted and removed. This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dan Williams <dan.j.williams@intel.com>
cc: Vishal Verma <vishal.l.verma@intel.com>
cc: Keith Busch <keith.busch@intel.com>
cc: Dave Jiang <dave.jiang@intel.com>
cc: linux-nvdimm@lists.01.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1f58bb18 20-May-2019 Al Viro <viro@zeniv.linux.org.uk>

mount_pseudo(): drop 'name' argument, switch to d_make_root()

Once upon a time we used to set ->d_name of e.g. pipefs root
so that d_path() on pipes would work. These days it's
completely pointless - dentries of pipes are not even connected
to pipefs root. However, mount_pseudo() had set the root
dentry name (passed as the second argument) and callers
kept inventing names to pass to it. Including those that
didn't *have* any non-root dentries to start with...

All of that had been pointless for about 8 years now; it's
time to get rid of that cargo-culting...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 1a6e9e76 20-May-2019 Dan Williams <dan.j.williams@intel.com>

device-dax: Drop register_filesystem()

The device-dax fs is only there to allocate a common inode for each
device-node that refers to the same device by major:minor. It is
otherwise not user mountable and need not be displayed in
/proc/filesystems.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 7bf7eac8 16-May-2019 Dan Williams <dan.j.williams@intel.com>

dax: Arrange for dax_supported check to span multiple devices

Pankaj reports that starting with commit ad428cdb525a "dax: Check the
end of the block-device capacity with dax_direct_access()" device-mapper
no longer allows dax operation. This results from the stricter checks in
__bdev_dax_supported() that validate that the start and end of a
block-device map to the same 'pagemap' instance.

Teach the dax-core and device-mapper to validate the 'pagemap' on a
per-target basis. This is accomplished by refactoring the
bdev_dax_supported() internals into generic_fsdax_supported() which
takes a sector range to validate. Consequently generic_fsdax_supported()
is suitable to be used in a device-mapper ->iterate_devices() callback.
A new ->dax_supported() operation is added to allow composite devices to
split and route upper-level bdev_dax_supported() requests.

Fixes: ad428cdb525a ("dax: Check the end of the block-device...")
Cc: <stable@vger.kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Pankaj Gupta <pagupta@redhat.com>
Tested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 53e22829 10-Apr-2019 Al Viro <viro@zeniv.linux.org.uk>

dax: make use of ->free_inode()

we might want to drop ->destroy_inode() there - it's used only for
WARN_ON() now, and AFAICS that could be moved to ->evict_inode()
if we had one...

Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# ad428cdb 20-Feb-2019 Dan Williams <dan.j.williams@intel.com>

dax: Check the end of the block-device capacity with dax_direct_access()

The checks in __bdev_dax_supported() helped mitigate a potential data
corruption bug in the pmem driver's handling of section alignment
padding. Strengthen the checks, including checking the end of the range,
to validate the dev_pagemap, Xarray entries, and sector-to-pfn
translation established for pmem namespaces.

Acked-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9567da0b 12-Jul-2017 Dan Williams <dan.j.williams@intel.com>

device-dax: Introduce bus + driver model

In support of multiple device-dax instances per device-dax-region and
allowing the 'kmem' driver to attach to dax-instances instead of the
current device-node access, convert the dax sub-system from a class to a
bus. Recall that the kmem driver takes reserved / special purpose
memories and assigns them to be managed by the core-mm.

Aside from the fact the device-dax instances are registered and probed
on a bus, two other lifetime-management changes are made:

1/ Delay attaching a cdev until driver probe time

2/ A new run_dax() helper is introduced to allow restoring dax-operation
after a kill_dax() event. So, at driver ->probe() time we run_dax()
and at ->remove() time we kill_dax() and invalidate all mappings.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 51cf784c 12-Jul-2017 Dan Williams <dan.j.williams@intel.com>

device-dax: Start defining a dax bus model

Towards eliminating the dax_class, move the dax-device-attribute
enabling to a new bus.c file in the core. The amount of code
thrash of sub-sequent patches is reduced as no logic changes are made,
just pure code movement.

A temporary export of unregister_dex_dax() and dax_attribute_groups is
needed to preserve compilation, but those symbols become static again in
a follow-on patch.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e0b401e3 30-Jul-2018 Huaisheng Ye <yehs1@lenovo.com>

dax/super: Do not request a pointer kaddr when not required

Function __bdev_dax_supported doesn't need to get local pointer kaddr
from direct_access. Using NULL instead of having to pass in a useless
local pointer that caller then just throw away.

Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>


# 15256f6c 26-Jun-2018 Ross Zwisler <zwisler@kernel.org>

dax: check for QUEUE_FLAG_DAX in bdev_dax_supported()

Add an explicit check for QUEUE_FLAG_DAX to __bdev_dax_supported(). This
is needed for DM configurations where the first element in the dm-linear or
dm-stripe target supports DAX, but other elements do not. Without this
check __bdev_dax_supported() will pass for such devices, letting a
filesystem on that device mount with the DAX option.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Mike Snitzer <snitzer@redhat.com>
Fixes: commit 545ed20e6df6 ("dm: add infrastructure for DAX support")
Cc: stable@vger.kernel.org
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 808c340b 06-Jun-2018 Ross Zwisler <zwisler@kernel.org>

dax: Use dax_write_cache* helpers

Use dax_write_cache() and dax_write_cache_enabled() instead of open coding
the bit operations.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 80660f20 30-May-2018 Dave Jiang <dave.jiang@intel.com>

dax: change bdev_dax_supported() to support boolean returns

The function return values are confusing with the way the function is
named. We expect a true or false return value but it actually returns
0/-errno. This makes the code very confusing. Changing the return values
to return a bool where if DAX is supported then return true and no DAX
support returns false.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>


# ba23cba9 30-May-2018 Darrick J. Wong <darrick.wong@oracle.com>

fs: allow per-device dax status checking for filesystems

Change bdev_dax_supported so it takes a bdev parameter. This enables
multi-device filesystems like xfs to check that a dax device can work for
the particular filesystem. Once that's in place, actually fix all the
parts of XFS where we need to be able to distinguish between datadev and
rtdev.

This patch fixes the problem where we screw up the dax support checking
in xfs if the datadev and rtdev have different dax capabilities.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[rez: Re-added __bdev_dax_supported() for !CONFIG_FS_DAX cases]
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>


# b3a9a0c3 02-May-2018 Dan Williams <dan.j.williams@intel.com>

dax: Introduce a ->copy_to_iter dax operation

Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e7638488 16-May-2018 Dan Williams <dan.j.williams@intel.com>

mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS

In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
be able to rely on the fact that they will get wakeups on dev_pagemap
page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
generic_dax_page_free() as common indicator / infrastructure for dax
filesytems to require. With this change there are no users of the
MEMORY_DEVICE_HOST designation, so remove it.

The HMM sub-system extended dev_pagemap to arrange a callback when a
dev_pagemap managed page is freed. Since a dev_pagemap page is free /
idle when its reference count is 1 it requires an additional branch to
check the page-type at put_page() time. Given put_page() is a hot-path
we do not want to incur that check if HMM is not in use, so a static
branch is used to avoid that overhead when not necessary.

Now, the FS_DAX implementation wants to reuse this mechanism for
receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
static-key into a generic mechanism that either HMM or FS_DAX code paths
can enable.

For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
However, we still need to support FS_DAX in the FS_DAX_LIMITED case
implemented by the s390/dcssblk driver.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 3fe0791c 14-Oct-2017 Dan Williams <dan.j.williams@intel.com>

dax: store pfns in the radix

In preparation for examining the busy state of dax pages in the truncate
path, switch from sectors to pfns in the radix.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9d4949b4 26-Feb-2018 Boaz Harrosh <boazh@netapp.com>

dax: ->direct_access does not sleep anymore

In Patch:
[7a862fb] brd: remove dax support

Dan Williams has removed the only might_sleep
implementation of ->direct_access.
So we no longer need to check for it.

CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Boaz Harrosh <boazh@netapp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 569d0365 14-Oct-2017 Dan Williams <dan.j.williams@intel.com>

dax: require 'struct page' by default for filesystem dax

If a dax buffer from a device that does not map pages is passed to
read(2) or write(2) as a target for direct-I/O it triggers SIGBUS. If
gdb attempts to examine the contents of a dax buffer from a device that
does not map pages it triggers SIGBUS. If fork(2) is called on a process
with a dax mapping from a device that does not map pages it triggers
SIGBUS. 'struct page' is required otherwise several kernel code paths
break in surprising ways. Disable filesystem-dax on devices that do not
map pages.

In addition to needing pfn_to_page() to be valid we also require devmap
pages. We need this to detect dax pages in the get_user_pages_fast()
path and so that we can stop managing the VM_MIXEDMAP flag. For DAX
drivers that have not supported get_user_pages() to date we allow them
to opt-in to supporting DAX with the CONFIG_FS_DAX_LIMITED configuration
option which requires ->direct_access() to return pfn_t_special() pfns.
This leaves DAX support in brd disabled and scheduled for removal.

Note that when the initial dax support was being merged a few years back
there was concern that struct page was unsuitable for use with next
generation persistent memory devices. The theoretical concern was that
struct page access, being such a hotly used data structure in the
kernel, would lead to media wear out. While that was a reasonable
conservative starting position it has not held true in practice. We have
long since committed to using devm_memremap_pages() to support higher
order kernel functionality that needs get_user_pages() and
pfn_to_page().


Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9f586fff 14-Nov-2017 Mikulas Patocka <mpatocka@redhat.com>

dax: fix general protection fault in dax_alloc_inode

Don't crash in case of allocation failure in dax_alloc_inode.

syzkaller hit the following crash on e4880bc5dfb1

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
[..]
RIP: 0010:dax_alloc_inode+0x3b/0x70 drivers/dax/super.c:348
Call Trace:
alloc_inode+0x65/0x180 fs/inode.c:208
new_inode_pseudo+0x69/0x190 fs/inode.c:890
new_inode+0x1c/0x40 fs/inode.c:919
mount_pseudo_xattr+0x288/0x560 fs/libfs.c:261
mount_pseudo include/linux/fs.h:2137 [inline]
dax_mount+0x2e/0x40 drivers/dax/super.c:388
mount_fs+0x66/0x2d0 fs/super.c:1223

Cc: <stable@vger.kernel.org>
Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider...")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 6a21586a 14-Oct-2017 Dan Williams <dan.j.williams@intel.com>

dax: stop requiring a live device for dax_flush()

Now that dax_flush() is no longer a driver callback (commit c3ca015fab6d
"dax: remove the pmem_dax_ops->flush abstraction"), stop requiring the
dax_read_lock() to be held and the device to be alive. This is in
preparation for switching filesystem-dax to store pfns instead of
sectors in the radix.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 66a86cc1 14-Oct-2017 Dan Williams <dan.j.williams@intel.com>

dax: quiet bdev_dax_supported()

Before we add another failure reason, quiet the existing log messages.
Leave it to the caller to decide if bdev_dax_supported() failures are
errors worth emitting to the log.

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# d083e6da 25-Sep-2017 Arvind Yadav <arvind.yadav.cs@gmail.com>

dax: pr_err() strings should end with newlines

pr_err() messages should terminated with a new-line to avoid
other messages being concatenated onto the end.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# c3ca015f 31-Aug-2017 Mikulas Patocka <mpatocka@redhat.com>

dax: remove the pmem_dax_ops->flush abstraction

Commit abebfbe2f731 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2f731 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 26f2f4de 03-Sep-2017 Dan Williams <dan.j.williams@intel.com>

dax: fix FS_DAX=n BLOCK=y compilation

The 0day kbuild robot reports:

>> drivers//dax/super.c:64:20: error: redefinition of 'fs_dax_get_by_bdev'
struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
^~~~~~~~~~~~~~~~~~
In file included from drivers//dax/super.c:22:0:
include/linux/dax.h:76:34: note: previous definition of 'fs_dax_get_by_bdev' was here
static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
^~~~~~~~~~~~~~~~~~

Protect the definition of fs_dax_get_by_bdev() in drivers/dax/super.c
with an ifdef.

Fixes: 78f354735081 ("dax: introduce a fs_dax_get_by_bdev() helper")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 78f35473 30-Aug-2017 Dan Williams <dan.j.williams@intel.com>

dax: introduce a fs_dax_get_by_bdev() helper

Add a helper that can replace the following common pattern:

if (blk_queue_dax(bdev->bd_queue))
fs_dax_get_by_host(bdev->bd_disk->disk_name);

This will be used to move dax_device lookup from iomap-operation time to
fs-mount time.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 273752c9 26-Jul-2017 Vivek Goyal <vgoyal@redhat.com>

dm, dax: Make sure dm_dax_flush() is called if device supports it

Currently dm_dax_flush() is not being called, even if underlying dax
device supports write cache, because DAXDEV_WRITE_CACHE is not being
propagated up to the DM dax device.

If the underlying dax device supports write cache, set
DAXDEV_WRITE_CACHE on the DM dax device. This will cause dm_dax_flush()
to be called.

Fixes: abebfbe2f7 ("dm: add ->flush() dax operation support")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>


# 6e0c90d6 26-Jun-2017 Dan Williams <dan.j.williams@intel.com>

libnvdimm, pmem, dax: export a cache control attribute

The dax_flush() operation can be turned into a nop on platforms where
firmware arranges for cpu caches to be flushed on a power-fail event.
The ACPI 6.2 specification defines a mechanism for the platform to
indicate this capability so the kernel can select the proper default.
However, for other platforms, the administrator must toggle this setting
manually.

Given this flush setting is a dax-specific mechanism we advertise it
through a 'dax' attribute group hanging off a host device. For example,
a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
'write_cache' attribute to control response to dax cache flush requests.
This is similar to the 'queue/write_cache' attribute that appears under
block devices.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9a60c3ef 27-Jun-2017 Dan Williams <dan.j.williams@intel.com>

dax: convert to bitmask for flags

In preparation for adding more flags, convert the existing flag to a
bit-flag.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 5d61e43b 27-Jun-2017 Dan Williams <dan.j.williams@intel.com>

dax: remove default copy_from_iter fallback

Require all dax-drivers to register a ->copy_from_iter() operation so
that it is clear which dax_operations are optional and which must be
implemented for filesystem-dax to operate.

Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# abebfbe2 29-May-2017 Dan Williams <dan.j.williams@intel.com>

dm: add ->flush() dax operation support

Allow device-mapper to route flush operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying flush implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 7e026c8c 29-May-2017 Dan Williams <dan.j.williams@intel.com>

dm: add ->copy_from_iter() dax operation support

Allow device-mapper to route copy_from_iter operations to the
per-target implementation. In order for the device stacking to work we
need a dax_dev and a pgoff relative to that device. This gives each
layer of the stack the information it needs to look up the operation
pointer for the next level.

This conceptually allows for an array of mixed device drivers with
varying copy_from_iter implementations.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# b9d39d17 09-Jun-2017 Dan Williams <dan.j.williams@intel.com>

device-dax: fix 'dax' device filesystem inode destruction crash

The inode destruction path for the 'dax' device filesystem incorrectly
assumes that the inode was initialized through 'alloc_dax()'. However,
if someone attempts to directly mount the dax filesystem with 'mount -t
dax dax mnt' that will bypass 'alloc_dax()' and the following failure
signatures may occur as a result:

kill_dax() must be called before final iput()
WARNING: CPU: 2 PID: 1188 at drivers/dax/super.c:243 dax_destroy_inode+0x48/0x50
RIP: 0010:dax_destroy_inode+0x48/0x50
Call Trace:
destroy_inode+0x3b/0x60
evict+0x139/0x1c0
iput+0x1f9/0x2d0
dentry_unlink_inode+0xc3/0x160
__dentry_kill+0xcf/0x180
? dput+0x37/0x3b0
dput+0x3a3/0x3b0
do_one_tree+0x36/0x40
shrink_dcache_for_umount+0x2d/0x90
generic_shutdown_super+0x1f/0x120
kill_anon_super+0x12/0x20
deactivate_locked_super+0x43/0x70
deactivate_super+0x4e/0x60

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
RIP: 0010:kfree+0x6d/0x290
Call Trace:
<IRQ>
dax_i_callback+0x22/0x60
? dax_destroy_inode+0x50/0x50
rcu_process_callbacks+0x298/0x740

ida_remove called for id=0 which is not allocated.
WARNING: CPU: 0 PID: 0 at lib/idr.c:383 ida_remove+0x110/0x120
[..]
Call Trace:
<IRQ>
ida_simple_remove+0x2b/0x50
? dax_destroy_inode+0x50/0x50
dax_i_callback+0x3c/0x60
rcu_process_callbacks+0x298/0x740

Add missing initialization of the 'struct dax_device' and inode so that
the destruction path does not kfree() or ida_simple_remove()
uninitialized data.

Fixes: 7b6be8444e0f ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances")
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9d109081 13-May-2017 Dan Williams <dan.j.williams@intel.com>

dax: fix false CONFIG_BLOCK dependency

In the BLOCK=n case the dax core does not need to / must not emit the
block-device-dax helpers. Otherwise it leads to compile errors.

Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: Fabian Frederick <fabf@skynet.be>
Fixes: ef51042472f5 ("block, dax: move 'select DAX' from BLOCK to FS_DAX")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# cf1e2289 08-May-2017 Dan Williams <dan.j.williams@intel.com>

device-dax: kill NR_DEV_DAX

There is no point to ask how many device-dax instances the kernel should
support. Since we are already using a dynamic major number, just allow
the max number of minors by default and be done. This also fixes the
fact that the proposed max for the NR_DEV_DAX range was larger than what
could be supported by alloc_chrdev_region().

Fixes: ba09c01d2fa8 ("dax: convert to the cdev api")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# ef510424 08-May-2017 Dan Williams <dan.j.williams@intel.com>

block, dax: move "select DAX" from BLOCK to FS_DAX

For configurations that do not enable DAX filesystems or drivers, do not
require the DAX core to be built.

Given that the 'direct_access' method has been removed from
'block_device_operations', we can also go ahead and remove the
block-related dax helper functions from fs/block_dev.c to
drivers/dax/super.c. This keeps dax details out of the block layer and
lets the DAX core be built as a module in the FS_DAX=n case.

Filesystems need to include dax.h to call bdev_dax_supported().

Cc: linux-xfs@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# b0686260 26-Jan-2017 Dan Williams <dan.j.williams@intel.com>

dax: introduce dax_direct_access()

Replace bdev_direct_access() with dax_direct_access() that uses
dax_device and dax_operations instead of a block_device and
block_device_operations for dax. Once all consumers of the old api have
been converted bdev_direct_access() will be deleted.

Given that block device partitioning decisions can cause dax page
alignment constraints to be violated this also introduces the
bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative
to the dax_device and also checks for page alignment.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 6568b08b 24-Jan-2017 Dan Williams <dan.j.williams@intel.com>

dax: introduce dax_operations

Track a set of dax_operations per dax_device that can be set at
alloc_dax() time. These operations will be used to stop the abuse of
block_device_operations for communicating dax capabilities to
filesystems. It will also be used to replace the "pmem api" and move
pmem-specific cache maintenance, and other dax-driver-specific
filesystem-dax operations, to dax device methods. In particular this
allows us to stop abusing __copy_user_nocache(), via memcpy_to_pmem(),
with a driver specific replacement.

This is a standalone introduction of the operations. Follow on patches
convert each dax-driver and teach fs/dax.c to use ->direct_access() from
dax_operations instead of block_device_operations.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 72058005 19-Apr-2017 Dan Williams <dan.j.williams@intel.com>

dax: add a facility to lookup a dax device by 'host' device name

For the current block_device based filesystem-dax path, we need a way
for it to lookup the dax_device associated with a block_device. Add a
'host' property of a dax_device that can be used for this purpose. It is
a free form string, but for a dax_device associated with a block device
it is the bdev name.

This is a stop-gap until filesystems are able to mount on a dax-inode
directly.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 7b6be844 11-Apr-2017 Dan Williams <dan.j.williams@intel.com>

dax: refactor dax-fs into a generic provider of 'struct dax_device' instances

We want dax capable drivers to be able to publish a set of dax
operations [1]. However, we do not want to further abuse block_devices
to advertise these operations. Instead we will attach these operations
to a dax device and add a lookup mechanism to go from block device path
to a dax device. A dax capable driver like pmem or brd is responsible
for registering a dax device, alongside a block device, and then a dax
capable filesystem is responsible for retrieving the dax device by path
name if it wants to call dax_operations.

For now, we refactor the dax pseudo-fs to be a generic facility, rather
than an implementation detail, of the device-dax use case. Where a "dax
device" is just an inode + dax infrastructure, and "Device DAX" is a
mapping service layered on top of that base 'struct dax_device'.
"Filesystem DAX" is then a mapping service that layers a filesystem on
top of that same base device. Filesystem DAX is associated with a
block_device for now, but perhaps directly to a dax device in the
future, or for new pmem-only filesystems.

[1]: https://lkml.org/lkml/2017/1/19/880

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>