History log of /linux-master/drivers/pci/p2pdma.c
Revision Date Author Comments
# 1e5c66af 04-Jan-2024 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

PCI/P2PDMA: Fix a sleeping issue in a RCU read section

It is not allowed to sleep within a RCU read section, so use GFP_ATOMIC
instead of GFP_KERNEL here.

Link: https://lore.kernel.org/r/02d9ec4a10235def0e764ff1f5be881ba12e16e8.1704397858.git.christophe.jaillet@wanadoo.fr
Fixes: ae21f835a5bd ("PCI/P2PDMA: Finish RCU conversion of pdev->p2pdma")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 805b196f 23-Oct-2023 Tadeusz Struk <tstruk@gmail.com>

PCI/P2PDMA: Remove redundant goto

Remove redundant goto in pci_alloc_p2pmem().

Link: https://lore.kernel.org/r/20231023084050.55230-1-tstruk@gmail.com
Signed-off-by: Tadeusz Struk <tstruk@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 4a7ce833 02-Oct-2023 Gustavo A. R. Silva <gustavoars@kernel.org>

PCI/P2PDMA: Fix undefined behavior bug in struct pci_p2pdma_pagemap

Struct dev_pagemap is a flexible structure, which means that it contains a
flexible-array member. If dev_pagemap.nr_range > 1, the memory following
the dev_pagemap could be overwritten.

This is currently not an issue because pci_p2pdma_pagemap is not exposed
outside p2pdma.c, and p2pdma.c only sets dev_pagemap.nr_range to 1.

To prevent problems if p2pdma.c ever uses nr_range > 1, move the flexible
struct dev_pagemap to the end of struct pci_p2pdma_pagemap.

-Wflex-array-member-not-at-end is coming in GCC-14, and we are getting
ready to enable it globally.

Link: https://lore.kernel.org/r/ZRsUL/hATNruwtla@work
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 86b4ad7d 24-Aug-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Fix typos in docs and comments

Fix typos in docs and comments.

Link: https://lore.kernel.org/r/20230824193712.542167-11-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>


# 0e8207f5 11-Aug-2023 Zheng Zengkai <zhengzengkai@huawei.com>

PCI/P2PDMA: Use pci_dev_id() to simplify the code

When we have a struct pci_dev *, use pci_dev_id() instead of manually
composing the ID from dev->bus->number and dev->devfn.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20230811111057.31900-1-zhengzengkai@huawei.com
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 5376ced5 28-Mar-2023 Cai Huoqing <cai.huoqing@linux.dev>

PCI/P2PDMA: Fix pci_p2pmem_find_many() kernel-doc

Remove reference to pci_p2pmem_dma(), which has never existed.

Link: https://lore.kernel.org/r/20230329024731.5604-1-cai.huoqing@linux.dev
Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 6606f4c3 09-Feb-2023 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Annotate RCU dereference

A dereference of the __rcu pointer was noticed by sparse:

drivers/pci/p2pdma.c:199:44: sparse: sparse: dereference of noderef expression

Dereference the __rcu pointer using rcu_dereference_protected() instead of
accessing it directly. It's safe to use rcu_dereference_protected() because
a reference is held on the pgmap's percpu reference counter and thus it
cannot disappear.

Link: https://lore.kernel.org/r/20230209172953.4597-1-logang@deltatee.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>


# 8032bf12 09-Oct-2022 Jason A. Donenfeld <Jason@zx2c4.com>

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>


# 7e9c7ef8 21-Oct-2022 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Allow userspace VMA allocations through sysfs

Create a sysfs bin attribute called "allocate" under the existing
"p2pmem" group. The only allowable operation on this file is the mmap()
call.

When mmap() is called on this attribute, the kernel allocates a chunk of
memory from the genalloc and inserts the pages into the VMA. The
dev_pagemap .page_free callback will indicate when these pages are no
longer used and they will be put back into the genalloc.

On device unbind, remove the sysfs file before the memremap_pages are
cleaned up. This ensures unmap_mapping_range() is called on the files
inode and no new mappings can be created.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20221021174116.7200-9-logang@deltatee.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# e01bae16 16-Sep-2022 Yang Yingliang <yangyingliang@huawei.com>

PCI/P2PDMA: Use for_each_pci_dev() helper

Use for_each_pci_dev() instead of open-coding it. No functional change.

Link: https://lore.kernel.org/r/20220916140329.679633-1-yangyingliang@huawei.com
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 0d06132f 08-Jul-2022 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()

This interface is superseded by support in dma_map_sg() which now supports
heterogeneous scatterlists. There are no longer any users, so remove it.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 5e180ff3 08-Jul-2022 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Introduce helpers for dma_map_sg implementations

Add pci_p2pdma_map_segment() as a helper for dma_map_sg()
implementations. It takes an scatterlist segment that must point to a
pci_p2pdma struct page and will map it if the mapping requires a bus
address.

The return value indicates whether the mapping required a bus address
or whether the caller still needs to map the segment normally. If the
segment should not be mapped, -EREMOTEIO is returned.

This helper uses a state structure to track the changes to the
pgmap across calls and avoid needing to lookup into the xarray for
every page.

The prototype for the helper is added to dma-map-ops.h as it is only
useful to dma map implementations and don't need to pollute the public
pci-p2pdma header.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 719c9865 08-Jul-2022 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Attempt to set map_type if it has not been set

Attempt to find the mapping type for P2PDMA pages on the first
DMA map attempt if it has not been done ahead of time.

Previously, the mapping type was expected to be calculated ahead of
time, but if pages are to come from userspace then there's no
way to ensure the path was checked ahead of time.

This change will calculate the mapping type if it hasn't pre-calculated
so it is no longer invalid to call pci_p2pdma_map_sg() before the mapping
type is calculated, so drop the WARN_ON when that is the case.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 1af7c26c 10-Apr-2022 Shlomo Pongratz <shlomopongratz@gmail.com>

PCI/P2PDMA: Whitelist Intel Skylake-E Root Ports at any devfn

In 7b94b53db34f ("PCI/P2PDMA: Add Intel Sky Lake-E Root Ports B, C, D to
the whitelist"), Andrew Maier added Skylake-E 2031, 2032, and 2033 Root
Ports to the pci_p2pdma_whitelist[], so we assume P2PDMA between devices
below these ports works.

Previously we only checked the whitelist for a device at devfn 00.0 on the
root bus, which is often a "host bridge". But these Skylake Root Ports may
be at any devfn and there may be no "host bridge" device.

Generalize pci_host_bridge_dev() so we check the first device on the root
bus, whether it is devfn 00.0 or a PCIe Root Port, against the whitelist.

[bhelgaas: commit log, comment]
Link: https://lore.kernel.org/r/20220410105213.690-2-shlomop@pliops.com
Tested-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Shlomo Pongratz <shlomop@pliops.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Andrew Maier <andrew.maier@eideticom.com>


# feaea1fe 09-Feb-2022 Michael J. Ruhl <michael.j.ruhl@intel.com>

PCI/P2PDMA: Add Intel 3rd Gen Intel Xeon Scalable Processors to whitelist

In order to do P2P communication the bridge ID of the platform must be in
the P2P device table.

Update the P2P device table with a device ID for the 3rd Gen Intel Xeon
Scalable Processors.

Link: https://lore.kernel.org/r/20220209162801.7647-1-michael.j.ruhl@intel.com
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>


# 69f457b1 03-Nov-2021 Christophe JAILLET <christophe.jaillet@wanadoo.fr>

PCI/P2PDMA: Use percpu_ref_tryget_live_rcu() inside RCU critical section

Since pci_alloc_p2pmem() has already called rcu_read_lock(), we're in an
RCU read-side critical section and don't need to take the lock again. Use
percpu_ref_tryget_live_rcu() instead of percpu_ref_tryget_live() to save a
few cycles.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/ab80164f4d5b32f9e6240aa4863c3a147ff9c89f.1635974126.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# b80892ca 28-Oct-2021 Christoph Hellwig <hch@lst.de>

memremap: remove support for external pgmap refcounts

No driver is left using the external pgmap refcount, so remove the
code to support it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211028151017.50234-1-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# e0f7b192 15-Sep-2021 Krzysztof Wilczyński <kw@linux.com>

PCI: Use kstrtobool() directly, sans strtobool() wrapper

strtobool() is a wrapper around kstrtobool() that has been added for
backward compatibility.

There is no reason to use the old API, so use kstrtobool() directly.

Related: ef951599074b ("lib: move strtobool() to kstrtobool()")

Link: https://lore.kernel.org/r/20210915230127.2495723-3-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3a194079 08-Sep-2021 Wang Lu <wanglu@dapustor.com>

PCI/P2PDMA: Apply bus offset correctly in DMA address calculation

The bus offset is bus address - physical address, so the calculation in
__pci_p2pdma_map_sg should be: bus address = physical address + bus offset.

Correct the dma_address computation in __pci_p2pdma_map_sg().

Link: https://lore.kernel.org/r/20210909032528.24517-1-wanglu@dapustor.com
Signed-off-by: Wang Lu <wanglu@dapustor.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# ae21f835 01-Jul-2021 Eric Dumazet <edumazet@google.com>

PCI/P2PDMA: Finish RCU conversion of pdev->p2pdma

While looking at pci_alloc_p2pmem() I found RCU protection was not properly
applied there, as pdev->p2pdma was potentially read multiple times.

Fix pci_alloc_p2pmem(), add __rcu qualifier to p2pdma field of struct
pci_dev, and fix all other accesses to this field with proper RCU verbs.

Link: https://lore.kernel.org/r/20210701095621.3129283-1-eric.dumazet@gmail.com
Fixes: 1570175abd16 ("PCI/P2PDMA: track pgmap references per resource, not globally")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>


# d1b8dc09 13-Jun-2021 Christoph Hellwig <hch@lst.de>

PCI/P2PDMA: Simplify distance calculation

Merge __calc_map_type_and_dist() and calc_map_type_and_dist_warn() into
calc_map_type_and_dist() to simplify the code a bit. This now means we add
the devfn strings to the acs_buf unconditionally even if the buffer is not
printed, but that is not a lot of overhead and keeps the code much simpler.

Link: https://lore.kernel.org/r/20210614055310.3960791-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 3ec0c3ec 10-Jun-2021 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Avoid pci_get_slot(), which may sleep

In order to use upstream_bridge_distance_warn() from a dma_map function, it
must not sleep. However, pci_get_slot() takes the pci_bus_sem so it might
sleep.

In order to avoid this, try to get the host bridge's device from the first
element in the device list. It should be impossible for the host bridge's
device to go away while references are held on child devices, so the first
element should not be able to change and, thus, this should be safe.

Introduce a static function called pci_host_bridge_dev() to obtain the host
bridge's root device.

Link: https://lore.kernel.org/r/20210610160609.28447-7-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7e2faa17 10-Jun-2021 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Refactor pci_p2pdma_map_type()

All callers of pci_p2pdma_map_type() have a struct dev_pgmap and a struct
device (of the client doing the DMA transfer). Thus move the conversion to
struct pci_devs for the provider and client into this function.

Link: https://lore.kernel.org/r/20210610160609.28447-6-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# cf201bfe 10-Jun-2021 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Warn if host bridge not in whitelist

If the host bridge is not in the whitelist print a warning in the
calc_map_type_and_dist_warn() path detailing the vendor and device IDs that
would need to be added to the whitelist.

Suggested-by: Don Dutile <ddutile@redhat.com>
Link: https://lore.kernel.org/r/20210610160609.28447-5-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f9c125b9 10-Jun-2021 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Use correct calc_map_type_and_dist() return type

Instead of using an int for the return value of this function, use the
correct enum pci_p2pdma_map_type.

Link: https://lore.kernel.org/r/20210610160609.28447-4-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e4ece59a 10-Jun-2021 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Collect acs list in stack buffer to avoid sleeping

In order to call the calc_map_type_and_dist_warn() function from a dma_map
operation, the function must not sleep. The only reason it sleeps is to
allocate memory for the seq_buf to print a verbose warning telling the user
how to disable ACS for that path.

Instead of allocating the memory with kmalloc(), allocate a smaller buffer
on the stack. A 128 byte buffer is enough to print 10 PCI device names. A
system with 10 bridge ports between two devices that have ACS enabled would
be unusually large, so this should still be a reasonable limit.

This also cleans up the awkward (and broken) return with -ENOMEM which
contradicts the return type and the caller was not prepared for.

Link: https://lore.kernel.org/r/20210610160609.28447-3-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6389d437 10-Jun-2021 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Rename upstream_bridge_distance() and rework doc

The function upstream_bridge_distance() has evolved such that its name is
no longer entirely reflective of what the function does. It not only
calculates the distance between two peers but also calculates how the DMA
addresses for those two peers should be mapped.

Rename it to calc_map_type_and_dist() and rework the documentation to
better describe the two pieces of information the function returns.

[bhelgaas: tweak comment wording]
Link: https://lore.kernel.org/r/20210610160609.28447-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f8cf6e51 02-Jun-2021 Krzysztof Wilczyński <kw@linux.com>

PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions

The sysfs_emit() and sysfs_emit_at() functions were introduced to make
it less ambiguous which function is preferred when writing to the output
buffer in a device attribute's "show" callback [1].

Convert the PCI sysfs object "show" functions from sprintf(), snprintf()
and scnprintf() to sysfs_emit() and sysfs_emit_at() accordingly, as the
latter is aware of the PAGE_SIZE buffer and correctly returns the number
of bytes written into the buffer.

No functional change intended.

[1] Documentation/filesystems/sysfs.rst

Related commit: ad025f8e46f3 ("PCI/sysfs: Use sysfs_emit() and
sysfs_emit_at() in "show" functions").

Link: https://lore.kernel.org/r/20210603000112.703037-2-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 2f0cd59c 23-Oct-2020 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

PCI: Fix kernel-doc markup

Update kernel-doc so the names in the doc match the prototypes.

Link: https://lore.kernel.org/r/f19caf7a68f8365c8b573a42b4ac89ec21925c73.1603469755.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 73063ec5 06-Nov-2020 Christoph Hellwig <hch@lst.de>

PCI/P2PDMA: Cleanup __pci_p2pdma_map_sg a bit

Remove the pointless paddr variable that was only used once.

Link: https://lore.kernel.org/r/20201106181941.1878556-10-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# 4d34d52c 06-Nov-2020 Christoph Hellwig <hch@lst.de>

PCI/P2PDMA: Remove the DMA_VIRT_OPS hacks

Now that all users of dma_virt_ops are gone we can remove the workaround
for it in the PCI peer to peer code.

Link: https://lore.kernel.org/r/20201106181941.1878556-9-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>


# b7b3c01b 13-Oct-2020 Dan Williams <dan.j.williams@intel.com>

mm/memremap_pages: support multiple ranges per invocation

In support of device-dax growing the ability to front physically
dis-contiguous ranges of memory, update devm_memremap_pages() to track
multiple ranges with a single reference counter and devm instance.

Convert all [devm_]memremap_pages() users to specify the number of ranges
they are mapping in their 'struct dev_pagemap' instance.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.co
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643103789.4062302.18426128170217903785.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106116293.30709.13350662794915396198.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# a4574f63 13-Oct-2020 Dan Williams <dan.j.williams@intel.com>

mm/memremap_pages: convert to 'struct range'

The 'struct resource' in 'struct dev_pagemap' is only used for holding
resource span information. The other fields, 'name', 'flags', 'desc',
'parent', 'sibling', and 'child' are all unused wasted space.

This is in preparation for introducing a multi-range extension of
devm_memremap_pages().

The bulk of this change is unwinding all the places internal to libnvdimm
that used 'struct resource' unnecessarily, and replacing instances of
'struct dev_pagemap'.res with 'struct dev_pagemap'.range.

P2PDMA had a minor usage of the resource flags field, but only to report
failures with "%pR". That is replaced with an open coded print of the
range.

[dan.carpenter@oracle.com: mm/hmm/test: use after free in dmirror_allocate_chunk()]
Link: https://lkml.kernel.org/r/20200926121402.GA7467@kadam

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> [xen]
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brice Goglin <Brice.Goglin@inria.fr>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hulk Robot <hulkci@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Yan <yanaijie@huawei.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jia He <justin.he@arm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/159643103173.4062302.768998885691711532.stgit@dwillia2-desk3.amr.corp.intel.com
Link: https://lkml.kernel.org/r/160106115761.30709.13539840236873663620.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# eec240e2 20-Sep-2020 Julia Lawall <Julia.Lawall@inria.fr>

PCI/P2PDMA: Drop double zeroing for sg_init_table()

sg_init_table() zeroes its first argument, so the allocation of that
argument doesn't have to.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x;
@@

x =
- kzalloc
+ kmalloc
(...)
...
sg_init_table(x,...)
// </smpl>

Link: https://lore.kernel.org/r/1600601186-7420-15-git-send-email-Julia.Lawall@inria.fr
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e7a7499d 24-Aug-2020 Krzysztof Wilczyński <kw@linux.com>

PCI: Use scnprintf(), not snprintf(), in sysfs "show" functions

Sysfs "show" methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf() [1].

snprintf(buf, size, ...) prints at most "size" bytes into "buf", but
returns the number of bytes that *would* be printed if "buf" were large
enough.

Replace use of snprintf() with scnprintf(). No functional change intended.

Related:
https://patchwork.kernel.org/patch/9946759/#20969333
https://lwn.net/Articles/69419

[1] Documentation/filesystems/sysfs.rst

[bhelgaas: squashed, commit log]
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20200824233918.26306-2-kw@linux.com
Link: https://lore.kernel.org/r/20200824233918.26306-3-kw@linux.com
Link: https://lore.kernel.org/r/20200824233918.26306-4-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7c2308f7 10-Aug-2020 Christoph Hellwig <hch@lst.de>

PCI/P2PDMA: Fix build without DMA ops

My commit to make DMA ops support optional missed the reference in
the p2pdma code. And while the build bot didn't manage to find a config
where this can happen, Matthew did. Fix this by replacing two IS_ENABLED
checks with ifdefs.

Fixes: 2f9237d4f6df ("dma-mapping: make support for dma ops optional")
Link: https://lore.kernel.org/r/20200810124843.1532738-1-hch@lst.de
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# dea286bb 29-Jul-2020 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Allow P2PDMA on AMD Zen and newer CPUs

Allow P2PDMA if the CPU vendor is AMD and family is 0x17 (Zen) or greater.

[bhelgaas: commit log, simplify #if/#else/#endif]
Link: https://lore.kernel.org/r/20200729231844.4653-1-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>


# 52fbf5bd 07-Jul-2020 Rajat Jain <rajatja@google.com>

PCI: Cache ACS capability offset in device

Currently the ACS capability is being looked up at a number of places. Read
and store it once at enumeration so that it can be used by all later. No
functional change intended.

Link: https://lore.kernel.org/r/20200707224604.3737893-2-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7d5b10fc 06-Apr-2020 Alex Deucher <alexdeucher@gmail.com>

PCI/P2PDMA: Add AMD Zen Raven and Renoir Root Ports to whitelist

According to the hardware architect, pre-Zen parts support p2p writes and
Zen parts support both p2p reads and writes.

Add entries for Zen parts Raven (0x15d0) and Renoir (0x1630).

Link: https://lore.kernel.org/r/20200406194201.846411-1-alexander.deucher@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>


# 7b94b53d 07-Feb-2020 Andrew Maier <andrew.maier@eideticom.com>

PCI/P2PDMA: Add Intel Sky Lake-E Root Ports B, C, D to the whitelist

Add the three remaining Intel Sky Lake-E host Root Ports to the whitelist
of p2pdma.

P2P has been tested and is working on this system.

Link: https://lore.kernel.org/r/20200207221219.4309-1-andrew.maier@eideticom.com
Signed-off-by: Andrew Maier <andrew.maier@eideticom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# bc123a51 06-Dec-2019 Armen Baloyan <abaloyan@gigaio.com>

PCI/P2PDMA: Add Intel SkyLake-E to the whitelist

Intel SkyLake-E was successfully tested for p2pdma transactions spanning
over a host bridge and PCI bridge with IOMMU on. Add it to the whitelist.

Link: https://lore.kernel.org/r/1575669165-31697-1-git-send-email-abaloyan@gigaio.com
Signed-off-by: Armen Baloyan <abaloyan@gigaio.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 27235b15 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Update pci_p2pdma_distance_many() documentation

The comment describing pci_p2pdma_distance_many() still referred to
the devices being behind the same root port. This no longer applies
so reword the documentation.

Link: https://lore.kernel.org/r/20190730163545.4915-15-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-15-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 5daf40b0 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Allow IOMMU for host bridge whitelist

Now that we map the requests correctly we can remove the iommu_present()
restriction.

Link: https://lore.kernel.org/r/20190730163545.4915-14-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-14-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 5d52e1ab 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: dma_map() requests that traverse the host bridge

Any requests that traverse the host bridge will need to be mapped into the
IOMMU, so call dma_map_sg() inside pci_p2pdma_map_sg() when appropriate.

Similarly, call dma_unmap_sg() inside pci_p2pdma_unmap_sg().

Link: https://lore.kernel.org/r/20190730163545.4915-13-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-13-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 110203be 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Store mapping method in an xarray

When upstream_bridge_distance() is called, store the method required to map
the DMA transfers in an xarray so it can be looked up efficiently on the
hot path in pci_p2pdma_map_sg().

Link: https://lore.kernel.org/r/20190730163545.4915-12-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-12-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 142c242e 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Factor out __pci_p2pdma_map_sg()

Factor out the bus-only mapping into its own static function. No
functional changes. The original pci_p2pdma_map_sg_attrs() will be used to
decide whether this is an appropriate way to map.

Link: https://lore.kernel.org/r/20190730163545.4915-11-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-11-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 7f73eac3 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Introduce pci_p2pdma_unmap_sg()

Add pci_p2pdma_unmap_sg() to the two places that call pci_p2pdma_map_sg().

This is a prep patch to introduce correct mappings for p2pdma transactions
that go through the root complex.

Link: https://lore.kernel.org/r/20190730163545.4915-10-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-10-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 2b9f4bb2 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Add attrs argument to pci_p2pdma_map_sg()

This is to match the dma_map_sg() API which this function will have to call
in an future patch.

Add a pci_p2pdma_map_sg_attrs() function and helper to call it with no
attributes just like the dma_map_sg() function.

Link: https://lore.kernel.org/r/20190730163545.4915-9-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-9-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 494d63b0 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Whitelist some Intel host bridges

Intel devices do not have good support for P2P requests that span different
host bridges as the transactions will cross the QPI/UPI bus and this does
not perform well.

Therefore, enable support for these devices only if the host bridges match.

Add Intel devices that have been tested and are known to work. There are
likely many others out there that will need to be tested and added.

Link: https://lore.kernel.org/r/20190730163545.4915-8-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-8-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 2c84d818 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Factor out host_bridge_whitelist()

Push both PCI devices into the whitelist checking function seeing some
hardware will require us ensuring they are on the same host bridge.

At the same time we rename root_complex_whitelist() to
host_bridge_whitelist() to match the terminology used in the code.

Link: https://lore.kernel.org/r/20190730163545.4915-7-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-7-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# e2cbfbf7 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Apply host bridge whitelist for ACS

When a P2PDMA transfer is rejected due to ACS being set, we can also check
the whitelist and allow the transactions.

Do this by pushing the whitelist check into the upstream_bridge_distance()
function.

Link: https://lore.kernel.org/r/20190730163545.4915-6-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-6-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# c6bfaeb5 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Factor out __upstream_bridge_distance()

This is a prep patch to create a second level helper. There are no
functional changes.

The root complex whitelist code will be moved into this function in a
subsequent patch.

Link: https://lore.kernel.org/r/20190730163545.4915-5-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-5-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 72583084 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Add constants for map type results to upstream_bridge_distance()

Add constant flags to indicate how two devices will be mapped or if they
are unsupported. upstream_bridge_distance() will now return the
mapping type and the distance in a passed-by-reference argument.

This helps annotate the code better, but the main reason is so we can use
the information to store the required mapping method in an xarray.

Link: https://lore.kernel.org/r/20190730163545.4915-4-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-4-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 0afea381 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Add provider's pci_dev to pci_p2pdma_pagemap struct

The provider will be needed to figure out how to map a device.

Link: https://lore.kernel.org/r/20190730163545.4915-3-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-3-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# a6e6fe65 12-Aug-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Introduce private pagemap structure

Move the PCI bus offset from the generic dev_pagemap structure to a
specific pci_p2pdma_pagemap structure.

This structure will grow in subsequent patches.

Link: https://lore.kernel.org/r/20190730163545.4915-2-logang@deltatee.com
Link: https://lore.kernel.org/r/20190812173048.9186-2-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# f6b6aefe 30-May-2019 Bjorn Helgaas <bhelgaas@google.com>

PCI: Fix typos and whitespace errors

Fix typos in drivers/pci. Comment and whitespace changes only.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>


# 9c002bb6 02-Jul-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Fix missing check for dma_virt_ops

Drivers that use dma_virt_ops were meant to be rejected when testing
compatibility for P2PDMA.

This check got inadvertently dropped in one of the later versions of the
original patchset, so add it back.

Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory")
Link: https://lore.kernel.org/r/20190702173544.21950-1-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d0b3517d 26-Jun-2019 Christoph Hellwig <hch@lst.de>

PCI/P2PDMA: use the dev_pagemap internal refcount

The functionality is identical to the one currently open coded in
p2pdma.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Tested-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# d8668bb0 26-Jun-2019 Christoph Hellwig <hch@lst.de>

memremap: pass a struct dev_pagemap to ->kill and ->cleanup

Passing the actual typed structure leads to more understandable code
vs just passing the ref member.

Reported-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 1e240e8d 26-Jun-2019 Christoph Hellwig <hch@lst.de>

memremap: move dev_pagemap callbacks into a separate structure

The dev_pagemap is a growing too many callbacks. Move them into a
separate ops structure so that they are not duplicated for multiple
instances, and an attacker can't easily overwrite them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>


# 6dbbd053 19-Jun-2019 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present

Presently, there is no path to DMA map P2PDMA memory, so if a TLP targeting
this memory hits the root complex and an IOMMU is present, the IOMMU will
reject the transaction, even if the RC would support P2PDMA.

So until the kernel knows to map these DMA addresses in the IOMMU, we
should not enable the whitelist when an IOMMU is present.

Link: https://lore.kernel.org/linux-pci/20190522201252.2997-1-logang@deltatee.com/
Fixes: 0f97da831026 ("PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root Complex")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>


# 50f44ee7 13-Jun-2019 Dan Williams <dan.j.williams@intel.com>

mm/devm_memremap_pages: fix final page put race

Logan noticed that devm_memremap_pages_release() kills the percpu_ref
drops all the page references that were acquired at init and then
immediately proceeds to unplug, arch_remove_memory(), the backing pages
for the pagemap. If for some reason device shutdown actually collides
with a busy / elevated-ref-count page then arch_remove_memory() should
be deferred until after that reference is dropped.

As it stands the "wait for last page ref drop" happens *after*
devm_memremap_pages_release() returns, which is obviously too late and
can lead to crashes.

Fix this situation by assigning the responsibility to wait for the
percpu_ref to go idle to devm_memremap_pages() with a new ->cleanup()
callback. Implement the new cleanup callback for all
devm_memremap_pages() users: pmem, devdax, hmm, and p2pdma.

Link: http://lkml.kernel.org/r/155727339156.292046.5432007428235387859.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 41e94a851304 ("add devm_memremap_pages")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1570175a 13-Jun-2019 Dan Williams <dan.j.williams@intel.com>

PCI/P2PDMA: track pgmap references per resource, not globally

In preparation for fixing a race between devm_memremap_pages_release()
and the final put of a page from the device-page-map, allocate a
percpu-ref per p2pdma resource mapping.

Link: http://lkml.kernel.org/r/155727338646.292046.9922678317501435597.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# e615a191 13-Jun-2019 Dan Williams <dan.j.williams@intel.com>

PCI/P2PDMA: fix the gen_pool_add_virt() failure path

The pci_p2pdma_add_resource() implementation immediately frees the pgmap
if gen_pool_add_virt() fails. However, that means that when @dev
triggers a devres release devm_memremap_pages_release() will crash
trying to access the freed @pgmap.

Use the new devm_memunmap_pages() to manually free the mapping in the
error path.

Link: http://lkml.kernel.org/r/155727337603.292046.13101332703665246702.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Fixes: 52916982af48 ("PCI/P2PDMA: Support peer-to-peer memory")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 0f97da83 18-Apr-2019 Christian König <ckoenig.leichtzumerken@gmail.com>

PCI/P2PDMA: Allow P2P DMA between any devices under AMD ZEN Root Complex

The PCI specs say that peer-to-peer DMA should be supported between any two
devices that have a common upstream PCI-to-PCI bridge. But devices under
different Root Ports don't share a common upstream bridge, and PCIe r4.0,
sec 1.3.1, says routing peer-to-peer transactions between Root Ports in the
same Root Complex is optional.

Many Root Complexes, including AMD ZEN, *do* support peer-to-peer DMA even
between Root Ports. Add a whitelist and allow peer-to-peer DMA if both
participants are attached to a Root Complex known to support it.

Link: https://lore.kernel.org/linux-pci/20190418115859.2394-1-christian.koenig@amd.com
Signed-off-by: Christian König <christian.koenig@amd.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 02917e9f 28-Dec-2018 Dan Williams <dan.j.williams@intel.com>

mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL

At Maintainer Summit, Greg brought up a topic I proposed around
EXPORT_SYMBOL_GPL usage. The motivation was considerations for when
EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional
step of reclassifying an existing export. Specifically, I wanted to make
the case that although the line is fuzzy and hard to specify in abstract
terms, it is nonetheless clear that devm_memremap_pages() and HMM
(Heterogeneous Memory Management) have crossed it. The
devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the
beginning, and HMM as a derivative of that functionality should have
naturally picked up that designation as well.

Contrary to typical rules, the HMM infrastructure was merged upstream with
zero in-tree consumers. There was a promise at the time that those users
would be merged "soon", but it has been over a year with no drivers
arriving. While the Nouveau driver is about to belatedly make good on
that promise it is clear that HMM was targeted first and foremost at an
out-of-tree consumer.

HMM is derived from devm_memremap_pages(), a facility Christoph and I
spearheaded to support persistent memory. It combines a device lifetime
model with a dynamically created 'struct page' / memmap array for any
physical address range. It enables coordination and control of the many
code paths in the kernel built to interact with memory via 'struct page'
objects. With HMM the integration goes even deeper by allowing device
drivers to hook and manipulate page fault and page free events.

One interpretation of when EXPORT_SYMBOL is suitable is when it is
exporting stable and generic leaf functionality. The
devm_memremap_pages() facility continues to see expanding use cases,
peer-to-peer DMA being the most recent, with no clear end date when it
will stop attracting reworks and semantic changes. It is not suitable to
export devm_memremap_pages() as a stable 3rd party driver API due to the
fact that it is still changing and manipulates core behavior. Moreover,
it is not in the best interest of the long term development of the core
memory management subsystem to permit any external driver to effectively
define its own system-wide memory management policies with no
encouragement to engage with upstream.

I am also concerned that HMM was designed in a way to minimize further
engagement with the core-MM. That, with these hooks in place,
device-drivers are free to implement their own policies without much
consideration for whether and how the core-MM could grow to meet that
need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the
core-MM should be allowed the opportunity and stimulus to change and
address these new use cases as first class functionality.

Original changelog:

hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
devm_memremap_pages() and are now simple now wrappers around the core
facility to inject a dev_pagemap instance into the global pgmap_radix and
hook page-idle events. The devm_memremap_pages() interface is base
infrastructure for HMM. HMM has more and deeper ties into the kernel
memory management implementation than base ZONE_DEVICE which is itself a
EXPORT_SYMBOL_GPL facility.

Originally, the HMM page structure creation routines copied the
devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the
implementations was discussed during the initial review:
http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to
extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled
this cleanup to move forward.

In addition to the integration with devm_memremap_pages() HMM depends on
other GPL-only symbols:

mmu_notifier_unregister_no_release
percpu_ref
region_intersects
__class_create

It goes further to consume / indirectly expose functionality that is not
exported to any other driver:

alloc_pages_vma
walk_page_range

HMM is derived from devm_memremap_pages(), and extends deep core-kernel
fundamentals. Similar to devm_memremap_pages(), mark its entry points
EXPORT_SYMBOL_GPL().

[logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()]
Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com
Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Balbir Singh <bsingharora@gmail.com>,
Cc: Michal Hocko <mhocko@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# fcf9ab35 01-Dec-2018 Randy Dunlap <rdunlap@infradead.org>

PCI/P2PDMA: Clean up documentation and kernel-doc

Fix typos, spellos, and grammar in p2pdma.rst and p2pdma.c.

Fix return value(s) in function pci_p2pmem_alloc_sgl().

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Jonathan Corbet <corbet@lwn.net>


# 2d7bc010 04-Oct-2018 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers

Users of the P2PDMA infrastructure will typically need a way for the user
to tell the kernel to use P2P resources. Typically this will be a simple
on/off boolean operation but sometimes it may be desirable for the user to
specify the exact device to use for the P2P operation.

Add new helpers for attributes which take a boolean or a PCI device. Any
boolean as accepted by strtobool() turn P2P on or off (such as 'y', 'n',
'1', '0', etc). Specifying a full PCI device name/BDF will select the
specific device.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 977196b8 04-Oct-2018 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset

The DMA address used when mapping PCI P2P memory must be the PCI bus
address. Thus, introduce pci_p2pmem_map_sg() to map the correct addresses
when using P2P memory.

Memory mapped in this way does not need to be unmapped and thus if we
provided pci_p2pmem_unmap_sg() it would be empty. This breaks the expected
balance between map/unmap but was left out as an empty function doesn't
really provide any benefit. In the future, if this call becomes necessary
it can be added without much difficulty.

For this, we assume that an SGL passed to these functions contain all P2P
memory or no P2P memory.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# cbb8ca69 04-Oct-2018 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Add sysfs group to display p2pmem stats

Add a sysfs group to display statistics about P2P memory that is registered
in each PCI device.

Attributes in the group display the total amount of P2P memory, the amount
available and whether it is published or not.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>


# 52916982 04-Oct-2018 Logan Gunthorpe <logang@deltatee.com>

PCI/P2PDMA: Support peer-to-peer memory

Some PCI devices may have memory mapped in a BAR space that's intended for
use in peer-to-peer transactions. To enable such transactions the memory
must be registered with ZONE_DEVICE pages so it can be used by DMA
interfaces in existing drivers.

Add an interface for other subsystems to find and allocate chunks of P2P
memory as necessary to facilitate transfers between two PCI peers:

struct pci_dev *pci_p2pmem_find[_many]();
int pci_p2pdma_distance[_many]();
void *pci_alloc_p2pmem();

The new interface requires a driver to collect a list of client devices
involved in the transaction then call pci_p2pmem_find() to obtain any
suitable P2P memory. Alternatively, if the caller knows a device which
provides P2P memory, they can use pci_p2pdma_distance() to determine if it
is usable. With a suitable p2pmem device, memory can then be allocated
with pci_alloc_p2pmem() for use in DMA transactions.

Depending on hardware, using peer-to-peer memory may reduce the bandwidth
of the transfer but can significantly reduce pressure on system memory.
This may be desirable in many cases: for example a system could be designed
with a small CPU connected to a PCIe switch by a small number of lanes
which would maximize the number of lanes available to connect to NVMe
devices.

The code is designed to only utilize the p2pmem device if all the devices
involved in a transfer are behind the same PCI bridge. This is because we
have no way of knowing whether peer-to-peer routing between PCIe Root Ports
is supported (PCIe r4.0, sec 1.3.1). Additionally, the benefits of P2P
transfers that go through the RC is limited to only reducing DRAM usage
and, in some cases, coding convenience. The PCI-SIG may be exploring
adding a new capability bit to advertise whether this is possible for
future hardware.

This commit includes significant rework and feedback from Christoph
Hellwig.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
[bhelgaas: fold in fix from Keith Busch <keith.busch@intel.com>:
https://lore.kernel.org/linux-pci/20181012155920.15418-1-keith.busch@intel.com,
to address comment from Dan Carpenter <dan.carpenter@oracle.com>, fold in
https://lore.kernel.org/linux-pci/20181017160510.17926-1-logang@deltatee.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>