#
bc22374c |
|
04-Feb-2024 |
Ricardo B. Marliere <ricardo@marliere.net> |
device-dax: make dax_bus_type const Now that the driver core can properly handle constant struct bus_type, move the dax_bus_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20240204-bus_cleanup-dax-v1-1-69a6e4a8553b@marliere.net Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
#
73954d37 |
|
24-Jan-2024 |
Vishal Verma <vishal.l.verma@intel.com> |
dax: add a sysfs knob to control memmap_on_memory behavior Add a sysfs knob for dax devices to control the memmap_on_memory setting if the dax device were to be hotplugged as system memory. The default memmap_on_memory setting for dax devices originating via pmem or hmem is set to 'false' - i.e. no memmap_on_memory semantics, to preserve legacy behavior. For dax devices via CXL, the default is on. The sysfs control allows the administrator to override the above defaults if needed. Link: https://lkml.kernel.org/r/20240124-vv-dax_abi-v7-5-20d16cb8d23d@intel.com Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Huang, Ying <ying.huang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
6ebed000 |
|
24-Jan-2024 |
Vishal Verma <vishal.l.verma@intel.com> |
dax/bus.c: replace several sprintf() with sysfs_emit() There were several places where drivers/dax/bus.c uses 'sprintf' to print sysfs data. Since a sysfs_emit() helper is available specifically for this purpose, replace all the sprintf() usage for sysfs with sysfs_emit() in this file. Link: https://lkml.kernel.org/r/20240124-vv-dax_abi-v7-2-20d16cb8d23d@intel.com Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
c05ae9d8 |
|
24-Jan-2024 |
Vishal Verma <vishal.l.verma@intel.com> |
dax/bus.c: replace driver-core lock usage by a local rwsem Patch series "Add DAX ABI for memmap_on_memory", v7. This series adds sysfs ABI to control memmap_on_memory behavior for DAX devices. Patch 1 replaces incorrect device_lock() usage with a local rwsem - this was identified during review. Patch 2 is also a preparatory patch that replaces sprintf() for sysfs operations with sysfs_emit() Patch 3 adds the missing documentation for the sysfs ABI for DAX regions and Dax devices. Patch 4 exports mhp_supports_memmap_on_memory(). Patch 5 adds the new ABI for toggling memmap_on_memory semantics for dax devices. This patch (of 5): The dax driver incorrectly used driver-core device locks to protect internal dax region and dax device configuration structures. Replace the device lock usage with a local rwsem, one each for dax region configuration and dax device configuration. As a result of this conversion, no device_lock() usage remains in dax/bus.c. Link: https://lkml.kernel.org/r/20240124-vv-dax_abi-v7-0-20d16cb8d23d@intel.com Link: https://lkml.kernel.org/r/20240124-vv-dax_abi-v7-1-20d16cb8d23d@intel.com Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Li Zhijian <lizhijian@fujitsu.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
4eca0ef4 |
|
07-Nov-2023 |
Vishal Verma <vishal.l.verma@intel.com> |
dax/kmem: allow kmem to add memory with memmap_on_memory Large amounts of memory managed by the kmem driver may come in via CXL, and it is often desirable to have the memmap for this memory on the new memory itself. Enroll kmem-managed memory for memmap_on_memory semantics if the dax region originates via CXL. For non-CXL dax regions, retain the existing default behavior of hot adding without memmap_on_memory semantics. Link: https://lkml.kernel.org/r/20231107-vv-kmem_memmap-v10-3-1253ec050ed0@intel.com Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Tested-by: Li Zhijian <lizhijian@fujitsu.com> [cxl.kmem and nvdimm.kmem] Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Fan Ni <fan.ni@samsung.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
#
75b3d5dc |
|
12-Sep-2023 |
Justin Stitt <justinstitt@google.com> |
dax: refactor deprecated strncpy `strncpy` is deprecated for use on NUL-terminated destination strings [1]. We should prefer more robust and less ambiguous string interfaces. `dax_id->dev_name` is expected to be NUL-terminated and has been zero-allocated. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer. Moreover, due to `dax_id` being zero-allocated the padding behavior of `strncpy` is not needed and a simple 1:1 replacement of strncpy -> strscpy should suffice. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
#
2532f416 |
|
03-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Cleanup extra dax_region references Now that free_dev_dax_id() internally manages the references it needs the extra references taken by the dax_region drivers are not needed. Reported-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/168577285161.1672036.8111253437794419696.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
#
70aab281 |
|
03-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Introduce alloc_dev_dax_id() The reference counting of dax_region objects is needlessly complicated, has lead to confusion [1], and has hidden a bug [2]. Towards cleaning up that mess introduce alloc_dev_dax_id() to minimize the holding of a dax_region reference to only what dev_dax_release() needs, the dax_region->ida. Part of the reason for the mess was the design to dereference a dax_region in all cases in free_dev_dax_id() even if the id was statically assigned by the upper level dax_region driver. Remove the need to call "is_static(dax_region)" by tracking whether the id is dynamic directly in the dev_dax instance itself. With that flag the dax_region pinning and release per dev_dax instance can move to alloc_dev_dax_id() and free_dev_dax_id() respectively. A follow-on cleanup address the unnecessary references in the dax_region setup and drivers. Fixes: 0f3da14a4f05 ("device-dax: introduce 'seed' devices") Link: http://lore.kernel.org/r/20221203095858.612027-1-liuyongqiang13@huawei.com [1] Link: http://lore.kernel.org/r/3cf0890b-4eb0-e70e-cd9c-2ecc3d496263@hpe.com [2] Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com> Reported-by: Paul Cassella <cassella@hpe.com> Reported-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/168577284563.1672036.13493034988900989554.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
#
82b4ceec |
|
03-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Use device_unregister() in unregister_dax_mapping() Replace an open-coded device_unregister() sequence with the helper. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/168577283989.1672036.7777592498865470652.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
#
6d24b170 |
|
03-Jun-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Fix dax_mapping_release() use after free A CONFIG_DEBUG_KOBJECT_RELEASE test of removing a device-dax region provider (like modprobe -r dax_hmem) yields: kobject: 'mapping0' (ffff93eb460e8800): kobject_release, parent 0000000000000000 (delayed 2000) [..] DEBUG_LOCKS_WARN_ON(1) WARNING: CPU: 23 PID: 282 at kernel/locking/lockdep.c:232 __lock_acquire+0x9fc/0x2260 [..] RIP: 0010:__lock_acquire+0x9fc/0x2260 [..] Call Trace: <TASK> [..] lock_acquire+0xd4/0x2c0 ? ida_free+0x62/0x130 _raw_spin_lock_irqsave+0x47/0x70 ? ida_free+0x62/0x130 ida_free+0x62/0x130 dax_mapping_release+0x1f/0x30 device_release+0x36/0x90 kobject_delayed_cleanup+0x46/0x150 Due to attempting ida_free() on an ida object that has already been freed. Devices typically only hold a reference on their parent while registered. If a child needs a parent object to complete its release it needs to hold a reference that it drops from its release callback. Arrange for a dax_mapping to pin its parent dev_dax instance until dax_mapping_release(). Fixes: 0b07ce872a9e ("device-dax: introduce 'mapping' devices") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/168577283412.1672036.16111545266174261446.stgit@dwillia2-xfh.jf.intel.com Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
#
e686c325 |
|
16-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax/kmem: Fix leak of memory-hotplug resources While experimenting with CXL region removal the following corruption of /proc/iomem appeared. Before: f010000000-f04fffffff : CXL Window 0 f010000000-f02fffffff : region4 f010000000-f02fffffff : dax4.0 f010000000-f02fffffff : System RAM (kmem) After (modprobe -r cxl_test): f010000000-f02fffffff : **redacted binary garbage** f010000000-f02fffffff : System RAM (kmem) ...and testing further the same is visible with persistent memory assigned to kmem: Before: 480000000-243fffffff : Persistent Memory 480000000-57e1fffff : namespace3.0 580000000-243fffffff : dax3.0 580000000-243fffffff : System RAM (kmem) After (ndctl disable-region all): 480000000-243fffffff : Persistent Memory 580000000-243fffffff : ***redacted binary garbage*** 580000000-243fffffff : System RAM (kmem) The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0" resources, and it also shows that the "System RAM (kmem)" resource is not being removed. The bug does not appear after "modprobe -r kmem", it requires the parent of "dax4.0" and "dax3.0" to be removed which re-parents the leaked "System RAM (kmem)" instances. Those in turn reference the freed resource as a parent. First up for the fix is release_mem_region_adjustable() needs to reliably delete the resource inserted by add_memory_driver_managed(). That is thwarted by a check for IORESOURCE_SYSRAM that predates the dax/kmem driver, from commit: 65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable") That appears to be working around the behavior of HMM's "MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that check removed the "System RAM (kmem)" resource gets removed, but corruption still occurs occasionally because the "dax" resource is not reliably removed. The dax range information is freed before the device is unregistered, so the driver can not reliably recall (another use after free) what it is meant to release. Lastly if that use after free got lucky, the driver was covering up the leak of "System RAM (kmem)" due to its use of release_resource() which detaches, but does not free, child resources. The switch to remove_resource() forces remove_memory() to be responsible for the deletion of the resource added by add_memory_driver_managed(). Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances") Cc: <stable@vger.kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167653656244.3147810.5705900882794040229.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e9ee9fe3 |
|
10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Assign RAM regions to memory-hotplug by default The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Where is my memory?". Recall that platforms are increasingly shipping with performance-differentiated memory pools beyond typical DRAM and NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic interleave, varied media types, and future fabric attached possibilities). For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux 'Soft Reserved') attribute is expected to be applied to all memory-pools that are not the general purpose pool. This designation gives an Operating System a chance to defer usage of a memory pool until later in the boot process where its performance properties can be interrogated and administrator policy can be applied. 'Soft Reserved' memory can be anything from too limited and precious to be part of the general purpose pool (HBM), too slow to host hot kernel data structures (some PMEM media), or anything in between. However, in the absence of an explicit policy, the memory should at least be made usable by default. The current device-dax default hides all non-general-purpose memory behind a device interface. The expectation is that the distribution of users that want the memory online by default vs device-dedicated-access by default follows the Pareto principle. A small number of enlightened users may want to do userspace memory management through a device, but general users just want the kernel to make the memory available with an option to get more advanced later. Arrange for all device-dax instances not backed by PMEM to default to attaching to the dax_kmem driver. From there the baseline memory hotplug policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=) gates whether the memory comes online or stays offline. Where, if it stays offline, it can be reliably converted back to device-mode where it can be partitioned, or fronted by a userspace allocator. So, if someone wants device-dax instances for their 'Soft Reserved' memory: 1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot with memhp_default_state=offline, or roll the dice and hope that the kernel has not pinned a page in that memory before step 2. 2/ Write a udev rule to convert the target dax device(s) from 'system-ram' mode to 'devdax' mode: daxctl reconfigure-device $dax -m devdax -f Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
2a81ada3 |
|
10-Jan-2023 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
driver core: make struct bus_type.uevent() take a const * The uevent() callback in struct bus_type should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
fc65c4eb |
|
14-Jan-2022 |
Joao Martins <joao.m.martins@oracle.com> |
device-dax: ensure dev_dax->pgmap is valid for dynamic devices Right now, only static dax regions have a valid @pgmap pointer in its struct dev_dax. Dynamic dax case however, do not. In preparation for device-dax compound devmap support, make sure that dev_dax pgmap field is set after it has been allocated and initialized. dynamic dax device have the @pgmap is allocated at probe() and it's managed by devm (contrast to static dax region which a pgmap is provided and dax core kfrees it). So in addition to ensure a valid @pgmap, clear the pgmap when the dynamic dax device is released to avoid the same pgmap ranges to be re-requested across multiple region device reconfigs. Add a static_dev_dax() and use that helper in dev_dax_probe() to ensure the initialization differences between dynamic and static regions are more explicit. While at it, consolidate the ranges initialization when we allocate the @pgmap for the dynamic dax region case. Also take the opportunity to document the differences between static and dynamic da regions. Link: https://lkml.kernel.org/r/20211202204422.26777-8-joao.m.martins@oracle.com Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
7ac5360c |
|
15-Dec-2021 |
Christoph Hellwig <hch@lst.de> |
dax: remove the copy_from_iter and copy_to_iter methods These methods indirect the actual DAX read/write path. In the end pmem uses magic flush and mc safe variants and fuse and dcssblk use plain ones while device mapper picks redirects to the underlying device. Add set_dax_nocache() and set_dax_nomc() APIs to control which copy routines are used to remove indirect call from the read/write fast path as well as a lot of boilerplate code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs] Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
30c6828a |
|
15-Dec-2021 |
Christoph Hellwig <hch@lst.de> |
dax: remove the DAXDEV_F_SYNC flag Remove the DAXDEV_F_SYNC flag and thus the flags argument to alloc_dax and just let the drivers call set_dax_synchronous directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://lore.kernel.org/r/20211215084508.435401-4-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
fb08a190 |
|
29-Nov-2021 |
Christoph Hellwig <hch@lst.de> |
dax: simplify the dax_device <-> gendisk association Replace the dax_host_hash with an xarray indexed by the pointer value of the gendisk, and require explicitly calls from the block drivers that want to associate their gendisk with a dax_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-5-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
83762cb5 |
|
15-Nov-2021 |
Dan Williams <dan.j.williams@intel.com> |
dax: Kill DEV_DAX_PMEM_COMPAT The /sys/class/dax compatibility option has shipped in the kernel for 4 years now which should be sufficient time for tools to abandon the old ABI in favor of the /sys/bus/dax device-model. Delete it now and see if anyone screams. Since this compatibility option shipped there has been more reports of users being surprised by the compat ABI than surprised by the "new", so the compat infrastructure has outlived its usefulness. Recall that /sys/bus/dax device-model is required for the dax kmem driver which allows PMEM to be used as "System RAM". The following projects were known to have a dependency on /sys/class/dax and have dropped their dependency as of the listed version: - ndctl (including libndctl, daxctl, and libdaxctl): v64+ - fio: v3.13+ - pmdk: v1.5.2+ As further evidence this option is no longer needed some distributions have already stopped enabling CONFIG_DEV_DAX_PMEM_COMPAT. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Link: https://lore.kernel.org/r/163701116195.3784476.726128179293466337.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
fc7a6209 |
|
13-Jul-2021 |
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> |
bus: Make remove callback return void The driver core ignores the return value of this callback because there is only little it can do when a device disappears. This is the final bit of a long lasting cleanup quest where several buses were converted to also return void from their remove callback. Additionally some resource leaks were fixed that were caused by drivers returning an error code in the expectation that the driver won't go away. With struct bus_type::remove returning void it's prevented that newly implemented buses return an ignored error code and so don't anticipate wrong expectations for driver authors. Reviewed-by: Tom Rix <trix@redhat.com> (For fpga) Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> (For drivers/s390 and drivers/vfio) Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> (For ARM, Amba and related parts) Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Chen-Yu Tsai <wens@csie.org> (for sunxi-rsb) Acked-by: Pali Rohár <pali@kernel.org> Acked-by: Mauro Carvalho Chehab <mchehab@kernel.org> (for media) Acked-by: Hans de Goede <hdegoede@redhat.com> (For drivers/platform) Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Acked-By: Vinod Koul <vkoul@kernel.org> Acked-by: Juergen Gross <jgross@suse.com> (For xen) Acked-by: Lee Jones <lee.jones@linaro.org> (For mfd) Acked-by: Johannes Thumshirn <jth@kernel.org> (For mcb) Acked-by: Johan Hovold <johan@kernel.org> Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> (For slimbus) Acked-by: Kirti Wankhede <kwankhede@nvidia.com> (For vfio) Acked-by: Maximilian Luz <luzmaximilian@gmail.com> Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> (For ulpi and typec) Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> (For ipack) Acked-by: Geoff Levand <geoff@infradead.org> (For ps3) Acked-by: Yehezkel Bernat <YehezkelShB@gmail.com> (For thunderbolt) Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> (For intel_th) Acked-by: Dominik Brodowski <linux@dominikbrodowski.net> (For pcmcia) Acked-by: Rafael J. Wysocki <rafael@kernel.org> (For ACPI) Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> (rpmsg and apr) Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> (For intel-ish-hid) Acked-by: Dan Williams <dan.j.williams@intel.com> (For CXL, DAX, and NVDIMM) Acked-by: William Breathitt Gray <vilhelm.gray@gmail.com> (For isa) Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (For firewire) Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> (For hid) Acked-by: Thorsten Scherer <t.scherer@eckelmann.de> (For siox) Acked-by: Sven Van Asbroeck <TheSven73@gmail.com> (For anybuss) Acked-by: Ulf Hansson <ulf.hansson@linaro.org> (For MMC) Acked-by: Wolfram Sang <wsa@kernel.org> # for I2C Acked-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Finn Thain <fthain@linux-m68k.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/20210713193522.1770306-6-u.kleine-koenig@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
9de82caa |
|
21-Mar-2021 |
Arnd Bergmann <arnd@arndb.de> |
dax: avoid -Wempty-body warnings gcc warns about an empty body in an else statement: drivers/dax/bus.c: In function 'do_id_store': drivers/dax/bus.c:94:48: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body] 94 | /* nothing to remove */; | ^ drivers/dax/bus.c:99:43: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body] 99 | /* dax_id already added */; | ^ In both of these cases, the 'else' exists only to have a place to add a comment, but that comment doesn't really explain that much either, so the easiest way to shut up that warning is to just remove the else. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20210322114514.3490752-1-arnd@kernel.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
0d519e0d |
|
05-Feb-2021 |
Uwe Kleine-König <uwe@kleine-koenig.org> |
dax-device: Make remove callback return void The driver core ignores the return value of struct bus_type::remove() because there is only little that can be done. To simplify the quest to make this function return void, let struct dax_device_driver::remove() return void, too. All users already unconditionally return 0, this commit makes it obvious that returning an error code isn't intended. Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Link: https://lore.kernel.org/r/20210205222842.34896-6-uwe@kleine-koenig.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
e307bf11 |
|
05-Feb-2021 |
Uwe Kleine-König <uwe@kleine-koenig.org> |
device-dax: Fix error path in dax_driver_register The static variable match_always_count is supposed to track if there is a driver registered that has .match_always set. If driver_register() fails, the previous increment must be undone. Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Link: https://lore.kernel.org/r/20210205222842.34896-4-uwe@kleine-koenig.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8029968e |
|
05-Feb-2021 |
Uwe Kleine-König <uwe@kleine-koenig.org> |
device-dax: Properly handle drivers without remove callback If all resources are allocated in .probe() using devm_ functions it might make sense to not provide a .remove() callback. Then the right thing is to just return success. Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Link: https://lore.kernel.org/r/20210205222842.34896-3-uwe@kleine-koenig.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
5b8e64f1 |
|
05-Feb-2021 |
Uwe Kleine-König <uwe@kleine-koenig.org> |
device-dax: Prevent registering drivers without probe callback The bus probe function dax_bus_probe() calls the probe callback without checking it to be non-NULL. Prevent a NULL pointer exception if a driver without a probe function is registered by refusing to register this driver. Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Link: https://lore.kernel.org/r/20210205222842.34896-2-uwe@kleine-koenig.org Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
7323fb22 |
|
25-Jan-2021 |
Shiyang Ruan <ruansy.fnst@cn.fujitsu.com> |
device-dax: Fix default return code of range_parse() The return value of range_parse() indicates the size when it is positive. The error code should be negative. Signed-off-by: Shiyang Ruan <ruansy.fnst@cn.fujitsu.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20210126021331.1059933-1-ruansy.fnst@cn.fujitsu.com Reported-by: Zhang Qilong <zhangqilong3@huawei.com> Fixes: 8490e2e25b5a ("device-dax: add a range mapping allocation attribute") Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
ff8da37d |
|
19-Dec-2020 |
Zhen Lei <thunder.leizhen@huawei.com> |
device-dax: Avoid an unnecessary check in alloc_dev_dax_range() Swap the calling sequence of krealloc() and __request_region(), call the latter first. In this way, the value of dev_dax->nr_range does not need to be considered when __request_region() failed. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20201219081840.1149-2-thunder.leizhen@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
6268d7da |
|
18-Dec-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Fix range release There are multiple locations that open-code the release of the last range in a device-dax instance. Consolidate this into a new dev_dax_trim_range() helper. This also addresses a kmemleak report: # cat /sys/kernel/debug/kmemleak [..] unreferenced object 0xffff976bd46f6240 (size 64): comm "ndctl", pid 23556, jiffies 4299514316 (age 5406.733s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 20 c3 37 00 00 00 .......... .7... ff ff ff 7f 38 00 00 00 00 00 00 00 00 00 00 00 ....8........... backtrace: [<00000000064003cf>] __kmalloc_track_caller+0x136/0x379 [<00000000d85e3c52>] krealloc+0x67/0x92 [<00000000d7d3ba8a>] __alloc_dev_dax_range+0x73/0x25c [<0000000027d58626>] devm_create_dev_dax+0x27d/0x416 [<00000000434abd43>] __dax_pmem_probe+0x1c9/0x1000 [dax_pmem_core] [<0000000083726c1c>] dax_pmem_probe+0x10/0x1f [dax_pmem] [<00000000b5f2319c>] nvdimm_bus_probe+0x9d/0x340 [libnvdimm] [<00000000c055e544>] really_probe+0x230/0x48d [<000000006cabd38e>] driver_probe_device+0x122/0x13b [<0000000029c7b95a>] device_driver_attach+0x5b/0x60 [<0000000053e5659b>] bind_store+0xb7/0xc3 [<00000000d3bdaadc>] drv_attr_store+0x27/0x31 [<00000000949069c5>] sysfs_kf_write+0x4a/0x57 [<000000004a8b5adf>] kernfs_fop_write+0x150/0x1e5 [<00000000bded60f0>] __vfs_write+0x1b/0x34 [<00000000b92900f0>] vfs_write+0xd8/0x1d1 Reported-by: Jane Chu <jane.chu@oracle.com> Cc: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/160834570161.1791850.14911670304441510419.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8bcbe313 |
|
20-Nov-2020 |
Zhen Lei <thunder.leizhen@huawei.com> |
device-dax: delete a redundancy check in dev_dax_validate_align() After we have done the alignment check for the length of each range, the alignment check for dev_dax_size(dev_dax) is no longer needed, because it get the sum of the length of each range. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20201120092057.2144-1-thunder.leizhen@huawei.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8490e2e2 |
|
13-Oct-2020 |
Joao Martins <joao.m.martins@oracle.com> |
device-dax: add a range mapping allocation attribute Add a sysfs attribute which denotes a range from the dax region to be allocated. It's an write only @mapping sysfs attribute in the format of '<start>-<end>' to allocate a range. @start and @end use hexadecimal values and the @pgoff is implicitly ordered wrt to previous writes to @mapping sysfs e.g. a write of a range of length 1G the pgoff is 0..1G(-4K), a second write will use @pgoff for 1G+4K..<size>. This range mapping interface is useful for: 1) Application which want to implement its own allocation logic, and thus pick the desired ranges from dax_region. 2) For use cases like VMM fast restart[0] where after kexec we want to the same gpa<->phys mappings (as originally created before kexec). [0] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643106970.4062302.10402616567780784722.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-5-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106119570.30709.4548889722645210610.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
6d82120f |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: add an 'align' attribute Introduce a device align attribute. While doing so, rename the region align attribute to be more explicitly named as so, but keep it named as @align to retain the API for tools like daxctl. Changes on align may not always be valid, when say certain mappings were created with 2M and then we switch to 1G. So, we validate all ranges against the new value being attempted, post resizing. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643105944.4062302.3131761052969132784.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-3-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106118486.30709.13012322227204800596.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
33cf94d7 |
|
13-Oct-2020 |
Joao Martins <joao.m.martins@oracle.com> |
device-dax: make align a per-device property Introduce @align to struct dev_dax. When creating a new device, we still initialize to the default dax_region @align. Child devices belonging to a region may wish to keep a different alignment property instead of a global region-defined one. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643105377.4062302.4159447829955683131.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-2-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106117957.30709.1142303024324655705.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0b07ce87 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: introduce 'mapping' devices In support of interrogating the physical address layout of a device with dis-contiguous ranges, introduce a sysfs directory with 'start', 'end', and 'page_offset' attributes. The alternative is trying to parse /proc/iomem, and that file will not reflect the extent layout until the device is enabled. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643104819.4062302.13691281391423291589.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106117446.30709.2751020815463722537.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
60e93dc0 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: add dis-contiguous resource support Break the requirement that device-dax instances are physically contiguous. With this constraint removed it allows fragmented available capacity to be fully allocated. This capability is useful to mitigate the "noisy neighbor" problem with memory-side-cache management for virtual machines, or any other scenario where a platform address boundary also designates a performance boundary. For example a direct mapped memory side cache might rotate cache colors at 1GB boundaries. With dis-contiguous allocations a device-dax instance could be configured to contain only 1 cache color. It also satisfies Joao's use case (see link) for partitioning memory for exclusive guest access. It allows for a future potential mode where the host kernel need not allocate 'struct page' capacity up-front. Reported-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/lkml/20200110190313.17144-1-joao.m.martins@oracle.com/ Link: https://lkml.kernel.org/r/159643104304.4062302.16561669534797528660.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106116875.30709.11456649969327399771.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
a4574f63 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
mm/memremap_pages: convert to 'struct range' The 'struct resource' in 'struct dev_pagemap' is only used for holding resource span information. The other fields, 'name', 'flags', 'desc', 'parent', 'sibling', and 'child' are all unused wasted space. This is in preparation for introducing a multi-range extension of devm_memremap_pages(). The bulk of this change is unwinding all the places internal to libnvdimm that used 'struct resource' unnecessarily, and replacing instances of 'struct dev_pagemap'.res with 'struct dev_pagemap'.range. P2PDMA had a minor usage of the resource flags field, but only to report failures with "%pR". That is replaced with an open coded print of the range. [dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()] Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen] Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: kernel test robot <lkp@intel.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
fcffb6a1 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: add resize support Make the device-dax 'size' attribute writable to allow capacity to be split between multiple instances in a region. The intended consumers of this capability are users that want to split a scarce memory resource between device-dax and System-RAM access, or users that want to have multiple security domains for a large region. By default the hmem instance provider allocates an entire region to the first instance. The process of creating a new instance (assuming a region-id of 0) is find the region and trigger the 'create' attribute which yields an empty instance to configure. For example: cd /sys/bus/dax/devices echo dax0.0 > dax0.0/driver/unbind echo $new_size > dax0.0/size echo 1 > $(readlink -f dax0.0)../dax_region/create seed=$(cat $(readlink -f dax0.0)../dax_region/seed) echo $new_size > $seed/size echo dax0.0 > ../drivers/{device_dax,kmem}/bind echo dax0.1 > ../drivers/{device_dax,kmem}/bind Instances can be destroyed by: echo $device > $(readlink -f $device)../dax_region/delete Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643102625.4062302.7431838945566033852.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106115239.30709.9850106928133493138.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
0f3da14a |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: introduce 'seed' devices Add a seed device concept for dynamic dax regions to be able to split the region amongst multiple sub-instances. The seed device, similar to libnvdimm seed devices, is a device that starts with zero capacity allocated and unbound to a driver. In contrast to libnvdimm seed devices explicit 'create' and 'delete' interfaces are added to the region to trigger seeds to be created and unused devices to be reclaimed. The explicit create and delete replaces implicit create as a side effect of probe and implicit delete when writing 0 to the size that libnvdimm implements. Delete can be performed on any 0-sized and idle device. This avoids the gymnastics of needing to move device_unregister() to its own async context. Specifically, it avoids the deadlock of deleting a device via one of its own attributes. It is also less surprising to userspace which never sees an extra device it did not request. For now just add the device creation, teardown, and ->probe() prevention. A later patch will arrange for the 'dax/size' attribute to be writable to allocate capacity from the region. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643101583.4062302.12255093902950754962.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106113873.30709.15168756050631539431.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f11cf813 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: introduce 'struct dev_dax' typed-driver operations In preparation for introducing seed devices the dax-bus core needs to be able to intercept ->probe() and ->remove() operations. Towards that end arrange for the bus and drivers to switch from raw 'struct device' driver operations to 'struct dev_dax' typed operations. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Yan <yanaijie@huawei.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/160106113357.30709.4541750544799737855.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
c2f3011e |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: add an allocation interface for device-dax instances In preparation for a facility that enables dax regions to be sub-divided, introduce infrastructure to track and allocate region capacity. The new dax_region/available_size attribute is only enabled for volatile hmem devices, not pmem devices that are defined by nvdimm namespace boundaries. This is per Jeff's feedback the last time dynamic device-dax capacity allocation support was discussed. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/linux-nvdimm/x49shpp3zn8.fsf@segfault.boston.devel.redhat.com Link: https://lkml.kernel.org/r/159643101035.4062302.6785857915652647857.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106112801.30709.14601438735305335071.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
f5516ec5 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: make pgmap optional for instance creation The passed in dev_pagemap is only required in the pmem case as the libnvdimm core may have reserved a vmem_altmap for dev_memremap_pages() to place the memmap in pmem directly. In the hmem case there is no agent reserving an altmap so it can all be handled by a core internal default. Pass the resource range via a new @range property of 'struct dev_dax_data'. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643099958.4062302.10379230791041872886.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lkml.kernel.org/r/160106110513.30709.4303239334850606031.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
174ebece |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: move instance creation parameters to 'struct dev_dax_data' In preparation for adding more parameters to instance creation, move existing parameters to a new struct. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643099411.4062302.1337305960720423895.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
ec826909 |
|
13-Oct-2020 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: drop the dax_region.pfn_flags attribute All callers specify the same flags to alloc_dax_region(), so there is no need to allow for anything other than PFN_DEV|PFN_MAP, or carry a ->pfn_flags around on the region. Device-dax instances are always page backed. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Link: https://lkml.kernel.org/r/159643098829.4062302.13611520567669439046.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
#
4e4ced93 |
|
31-Mar-2020 |
Vivek Goyal <vgoyal@redhat.com> |
dax: Move mandatory ->zero_page_range() check in alloc_dax() zero_page_range() dax operation is mandatory for dax devices. Right now that check happens in dax_zero_page_range() function. Dan thinks that's too late and its better to do the check earlier in alloc_dax(). I also modified alloc_dax() to return pointer with error code in it in case of failure. Right now it returns NULL and caller assumes failure happened due to -ENOMEM. But with this ->zero_page_range() check, I need to return -EINVAL instead. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
cb4dd729 |
|
12-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
dax: Add numa_node to the default device-dax attributes It is confusing that device-dax instances publish a 'target_node' attribute, but not a 'numa_node'. The 'numa_node' information is available elsewhere in the sysfs device hierarchy, but it is not obvious and not reliable from one device-dax instance-type (e.g. child devices of nvdimm namespaces) to the next (e.g. 'hmem' devices defined by EFI Specific Purpose Memory and the ACPI HMAT). Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309906102.1582359.4262088001244476001.stgit@dwillia2-desk3.amr.corp.intel.com
|
#
153dd286 |
|
12-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
dax: Simplify root read-only definition for the 'resource' attribute Rather than update the permission in ->is_visible() set the permission directly at declaration time. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309904959.1582359.7281180042781955506.stgit@dwillia2-desk3.amr.corp.intel.com
|
#
770619a9 |
|
12-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
dax: Create a dax device_type Move the open coded release method and attribute groups to a 'struct device_type' instance. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/157309904365.1582359.5451327195246651379.stgit@dwillia2-desk3.amr.corp.intel.com
|
#
460370ab |
|
06-Nov-2019 |
Dan Williams <dan.j.williams@intel.com> |
dax: Fix alloc_dax_region() compile warning PFN flags are (unsigned long long), fix the alloc_dax_region() calling convention to fix warnings of the form: >> include/linux/pfn_t.h:18:17: warning: large integer implicitly truncated to unsigned type [-Woverflow] #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3)) Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
#
fefc1d97 |
|
05-Jul-2019 |
Pankaj Gupta <pagupta@redhat.com> |
libnvdimm: add dax_dev sync flag This patch adds 'DAXDEV_SYNC' flag which is set for nd_region doing synchronous flush. This later is used to disable MAP_SYNC functionality for ext4 & xfs filesystem for devices don't support synchronous flush. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
40cdc60a |
|
20-Jun-2019 |
Vishal Verma <vishal.l.verma@intel.com> |
device-dax: Add a 'resource' attribute device-dax based devices were missing a 'resource' attribute to indicate the physical address range contributed by the device in question. This information is desirable to userspace tooling that may want to use the dax device as system-ram, and wants to selectively hotplug and online the memory blocks associated with a given device. Without this, the tooling would have to parse /proc/iomem for the memory ranges contributed by dax devices, which can be a workaround, but it is far easier to provide this information in the sysfs hierarchy. Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
c347bd71 |
|
22-Feb-2019 |
Vishal Verma <vishal.l.verma@intel.com> |
device-dax: Add a 'modalias' attribute to DAX 'bus' devices Add a 'modalias' attribute to devices under the DAX bus so that userspace is able to dynamically load modules as needed. Normally, udev can get the modalias from 'uevent', and that is correctly set up by the DAX bus. However other tooling such as 'libndctl' for interacting with drivers/nvdimm/, and 'libdaxctl' for drivers/dax/ can also use the modalias to dynamically load modules via libkmod lookups. The 'nd' bus set up by the libnvdimm subsystem exports a modalias attribute. Imitate this to export the same for the 'dax' bus. Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
21c75763 |
|
20-Feb-2019 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Add a 'target_node' attribute The target-node attribute is the Linux numa-node that a device-dax instance may create when it is online. Prior to being online the device's 'numa_node' property reflects the closest online cpu node which is the typical expectation of a device 'numa_node'. Once it is online it becomes its own distinct numa node, i.e. 'target_node'. Export the 'target_node' property to give userspace tooling the ability to predict the effective numa-node from a device-dax instance configured to provide 'System RAM' capacity. Cc: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
664525b2 |
|
24-Jan-2019 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Auto-bind device after successful new_id The typical 'new_id' attribute behavior is to immediately attach a device to its driver after a new device-id is added. Implement this behavior for the dax bus. Reported-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reported-by: Brice Goglin <Brice.Goglin@inria.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
8fc5c735 |
|
09-Nov-2018 |
Dan Williams <dan.j.williams@intel.com> |
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware Interface Table), is the first known instance of a memory range described by a unique "target" proximity domain. Where "initiator" and "target" proximity domains is an approach that the ACPI HMAT (Heterogeneous Memory Attributes Table) uses to described the unique performance properties of a memory range relative to a given initiator (e.g. CPU or DMA device). Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y char-device follows the traditional notion of 'numa-node' where the attribute conveys the closest online numa-node. That numa-node attribute is useful for cpu-binding and memory-binding processes *near* the device. However, when the memory range backing a 'pmem', or 'dax' device is onlined (memory hot-add) the memory-only-numa-node representing that address needs to be differentiated from the set of online nodes. In other words, the numa-node association of the device depends on whether you can bind processes *near* the cpu-numa-node in the offline device-case, or bind process *on* the memory-range directly after the backing address range is onlined. Allow for the case that platform firmware describes persistent memory with a unique proximity domain, i.e. when it is distinct from the proximity of DRAM and CPUs that are on the same socket. Plumb the Linux numa-node translation of that proximity through the libnvdimm region device to namespaces that are in device-dax mode. With this in place the proposed kmem driver [1] can optionally discover a unique numa-node number for the address range as it transitions the memory from an offline state managed by a device-driver to an online memory range managed by the core-mm. [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com Reported-by: Fan Du <fan.du@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jérôme Glisse <jglisse@redhat.com> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
730926c3 |
|
16-Jul-2017 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Add /sys/class/dax backwards compatibility On the expectation that some environments may not upgrade libdaxctl (userspace component that depends on the /sys/class/dax hierarchy), provide a default / legacy dax_pmem_compat driver. The dax_pmem_compat driver implements the original /sys/class/dax sysfs layout rather than /sys/bus/dax. When userspace is upgraded it can blacklist this module and switch to the dax_pmem driver going forward. CONFIG_DEV_DAX_PMEM_COMPAT and supporting code will be deleted according to the dax_pmem entry in Documentation/ABI/obsolete/. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
d200781e |
|
07-Nov-2018 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Add support for a dax override driver Introduce the 'new_id' concept for enabling a custom device-driver attach policy for dax-bus drivers. The intended use is to have a mechanism for hot-plugging device-dax ranges into the page allocator on-demand. With this in place the default policy of using device-dax for performance differentiated memory can be overridden by user-space policy that can arrange for the memory range to be managed as 'System RAM' with user-defined NUMA and other performance attributes. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
89ec9f2c |
|
29-Oct-2018 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Move resource pinning+mapping into the common driver Move the responsibility of calling devm_request_resource() and devm_memremap_pages() into the common device-dax driver. This is another preparatory step to allowing an alternate personality driver for a device-dax range. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
9567da0b |
|
12-Jul-2017 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Introduce bus + driver model In support of multiple device-dax instances per device-dax-region and allowing the 'kmem' driver to attach to dax-instances instead of the current device-node access, convert the dax sub-system from a class to a bus. Recall that the kmem driver takes reserved / special purpose memories and assigns them to be managed by the core-mm. Aside from the fact the device-dax instances are registered and probed on a bus, two other lifetime-management changes are made: 1/ Delay attaching a cdev until driver probe time 2/ A new run_dax() helper is introduced to allow restoring dax-operation after a kill_dax() event. So, at driver ->probe() time we run_dax() and at ->remove() time we kill_dax() and invalidate all mappings. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
#
51cf784c |
|
12-Jul-2017 |
Dan Williams <dan.j.williams@intel.com> |
device-dax: Start defining a dax bus model Towards eliminating the dax_class, move the dax-device-attribute enabling to a new bus.c file in the core. The amount of code thrash of sub-sequent patches is reduced as no logic changes are made, just pure code movement. A temporary export of unregister_dex_dax() and dax_attribute_groups is needed to preserve compilation, but those symbols become static again in a follow-on patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|