History log of /linux-master/drivers/pci/probe.c
Revision Date Author Comments
# 17423360 23-Feb-2024 David E. Box <david.e.box@linux.intel.com>

PCI/ASPM: Save L1 PM Substates Capability for suspend/resume

4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") restored the L1 PM Substates Capability after resume,
which reduced power consumption by making the ASPM L1.x states work after
resume.

a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume"") reverted 4ff116d0d5fd because resume failed on some
systems, so power consumption after resume increased again.

a7152be79b62 mentioned that we restore L1 PM substate configuration even
though ASPM L1 may already be enabled. This is due the fact that the
pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state().

Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec
5.5.4 more closely by:

1) Do not restore ASPM configuration in pci_restore_pcie_state() but
do that after PCIe capability is restored in pci_restore_aspm_state()
following PCIe r6.1, sec 5.5.4.

2) If BIOS reenables L1SS, particularly L1.2, we need to clear the
enables in the right order, downstream before upstream. Defer
restoring the L1SS config until we are at the downstream component.
Then update the config for both ends of the link in the prescribed
order.

3) Program ASPM L1 PM substate configuration before L1 enables.

4) Program ASPM L1 PM substate enables last, after rest of the fields
in the capability are programmed.

[bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores
in pci_restore_pcie_state()]

Link: https://lore.kernel.org/r/20240128233212.1139663-3-david.e.box@linux.intel.com
Link: https://lore.kernel.org/r/20240128233212.1139663-4-david.e.box@linux.intel.com
Link: https://lore.kernel.org/r/20240223205851.114931-5-helgaas@kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Co-developed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Co-developed-by: David E. Box <david.e.box@linux.intel.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Tasev Nikola <tasev.stefanoska@skynet.be> # Asus UX305FA
Cc: Mark Enriquez <enriquezmark36@gmail.com>
Cc: Thomas Witt <kernel@witt.link>
Cc: Werner Sembach <wse@tuxedocomputers.com>
Cc: Vidya Sagar <vidyas@nvidia.com>


# fa84f443 23-Feb-2024 David E. Box <david.e.box@linux.intel.com>

PCI/ASPM: Move pci_configure_ltr() to aspm.c

The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2
state and is only configured when CONFIG_PCIEASPM is set.

Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since
they only build when CONFIG_PCIEASPM is set. No functional change
intended.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com
[bhelgaas: commit log, split build change from function moves]
Link: https://lore.kernel.org/r/20240223205851.114931-2-helgaas@kernel.org
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# be9c3a4c 30-Oct-2023 Lukas Wunner <lukas@wunner.de>

PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=y

It is possible to enable CONFIG_PCI but disable CONFIG_SYSFS and for
space-constrained devices such as routers, such a configuration may
actually make sense.

However pci-sysfs.c is compiled even if CONFIG_SYSFS is disabled,
unnecessarily increasing the kernel's size.

To rectify that:

* Move pci_mmap_fits() to mmap.c. It is not only needed by
pci-sysfs.c, but also proc.c.

* Move pci_dev_type to probe.c and make it private. It references
pci_dev_attr_groups in pci-sysfs.c. Make that public instead for
consistency with pci_dev_groups, pcibus_groups and pci_bus_groups,
which are likewise public and referenced by struct definitions in
pci-driver.c and probe.c.

* Define pci_dev_groups, pci_dev_attr_groups, pcibus_groups and
pci_bus_groups to NULL if CONFIG_SYSFS is disabled. Provide empty
static inlines for pci_{create,remove}_legacy_files() and
pci_{create,remove}_sysfs_dev_files().

Result:

vmlinux size is reduced by 122996 bytes in my arm 32-bit test build.

Link: https://lore.kernel.org/r/85ca95ae8e4d57ccf082c5c069b8b21eb141846e.1698668982.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>


# 95140c2f 22-Nov-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Log bridge info when first enumerating bridge

Log bridge secondary/subordinate bus and window information at the same
time we log the bridge BARs, just after discovering the bridge and before
scanning the bridge's secondary bus. This logs the bridge and downstream
devices in a more logical order:

- pci 0000:00:01.0: [8086:1901] type 01 class 0x060400
- pci 0000:01:00.0: [10de:13b6] type 00 class 0x030200
- pci 0000:01:00.0: reg 0x10: [mem 0xec000000-0xecffffff]
- pci 0000:00:01.0: PCI bridge to [bus 01]
- pci 0000:00:01.0: bridge window [io 0xe000-0xefff]

+ pci 0000:00:01.0: [8086:1901] type 01 class 0x060400
+ pci 0000:00:01.0: PCI bridge to [bus 01]
+ pci 0000:00:01.0: bridge window [io 0xe000-0xefff]
+ pci 0000:01:00.0: [10de:13b6] type 00 class 0x030200
+ pci 0000:01:00.0: reg 0x10: [mem 0xec000000-0xecffffff]

Note that we read the windows into a temporary struct resource that is
thrown away, not into the resources in the struct pci_bus.

The windows may be adjusted after we know what downstream devices require,
and those adjustments are logged as they are made.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 63c6ebb2 04-Dec-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Log bridge windows conditionally

Previously pci_read_bridge_io(), pci_read_bridge_mmio(), and
pci_read_bridge_mmio_pref() unconditionally logged the bridge window
resource. A future change will call these functions earlier and more
often. Add a "log" parameter so callers can control whether to generate
the log message. No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 281e1f13 22-Nov-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Supply bridge device, not secondary bus, to read window details

Previously we logged information about devices *below* the bridge before
logging information about the bridge itself, e.g.,

pci 0000:00:01.0: [8086:1901] type 01 class 0x060400
pci 0000:01:00.0: [10de:13b6] type 00 class 0x030200
pci 0000:01:00.0: reg 0x10: [mem 0xec000000-0xecffffff]
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0: bridge window [io 0xe000-0xefff]

This is partly because the bridge windows are read in this path:

pci_scan_child_bus_extend
for (devfn = 0; devfn < 256; devfn += 8)
pci_scan_slot(bus, devfn) # scan below bridge
pcibios_fixup_bus(bus)
pci_read_bridge_bases(bus) # read bridge windows
pci_read_bridge_io(bus)

Remove the assumption that the secondary (child) pci_bus already exists by
passing in the bridge device (instead of the pci_bus) and a resource
pointer when reading bridge windows. A future change can use this to log
the bridge details before we enumerate the devices below the bridge.

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6f32099a 04-Dec-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Move pci_read_bridge_windows() below individual window accessors

Move pci_read_bridge_windows() below the functions that read the I/O,
memory, and prefetchable memory windows, so pci_read_bridge_windows() can
use them in the future. No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# dc4e6f21 06-Nov-2021 Puranjay Mohan <puranjay12@gmail.com>

PCI: Use resource names in PCI log messages

Use the pci_resource_name() to get the name of the resource and use it
while printing log messages.

[bhelgaas: rename to match struct resource * names, also use names in other
BAR messages]
Link: https://lore.kernel.org/r/20211106112606.192563-3-puranjay12@gmail.com
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 35259ff1 10-Nov-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Log device type during enumeration

Log the device type when enumeration a device. Sample output changes:

- pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
+ pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 conventional PCI endpoint

- pci 0000:00:1c.0: [8086:a110] type 01 class 0x060400
+ pci 0000:00:1c.0: [8086:a110] type 01 class 0x060400 PCIe Root Port

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e0f0a16f 10-Oct-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Use FIELD_GET()

Use FIELD_GET() and FIELD_PREP() to remove dependences on the field
position, i.e., the shift value. No functional change intended.

Link: https://lore.kernel.org/r/20231010204436.1000644-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# d15f1805 11-Sep-2023 Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

PCI: Do error check on own line to split long "if" conditions

Placing PCI error code check inside "if" condition usually results in need
to split lines. Combined with additional conditions the "if" condition
becomes messy.

Convert to the usual error handling pattern with an additional variable to
improve code readability. In addition, reverse the logic in
pci_find_vsec_capability() to get rid of &&.

No functional changes intended.

Link: https://lore.kernel.org/r/20230911125354.25501-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: PCI_POSSIBLE_ERROR()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 8ec9c1d5 05-Sep-2023 Ross Lagerwall <ross.lagerwall@citrix.com>

PCI: Free released resource after coalescing

release_resource() doesn't actually free the resource or resource list
entry so free the resource list entry to avoid a leak.

Closes: https://lore.kernel.org/r/878r9sga1t.fsf@kernel.org/
Fixes: e54223275ba1 ("PCI: Release resource invalidated by coalescing")
Link: https://lore.kernel.org/r/20230906110846.225369-1-ross.lagerwall@citrix.com
Reported-by: Kalle Valo <kvalo@kernel.org>
Tested-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.16+


# 86b4ad7d 24-Aug-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI: Fix typos in docs and comments

Fix typos in docs and comments.

Link: https://lore.kernel.org/r/20230824193712.542167-11-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>


# 5e70d0ac 17-Jul-2023 Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

PCI: Add locking to RMW PCI Express Capability Register accessors

Many places in the kernel write the Link Control and Root Control PCI
Express Capability Registers without proper concurrency control and this
could result in losing the changes one of the writers intended to make.

Add pcie_cap_lock spinlock into the struct pci_dev and use it to protect
bit changes made in the RMW capability accessors. Protect only a selected
set of registers by differentiating the RMW accessor internally to
locked/unlocked variants using a wrapper which has the same signature as
pcie_capability_clear_and_set_word(). As the Capability Register (pos)
given to the wrapper is always a constant, the compiler should be able to
simplify all the dead-code away.

So far only the Link Control Register (ASPM, hotplug, link retraining,
various drivers) and the Root Control Register (AER & PME) seem to
require RMW locking.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Fixes: c7f486567c1d ("PCI PM: PCIe PME root port service driver")
Fixes: f12eb72a268b ("PCI/ASPM: Use PCI Express Capability accessors")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Fixes: affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM")
Fixes: 849a9366cba9 ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC")
Fixes: 3d1e7aa80d1c ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL")
Fixes: c0e5f4e73a71 ("misc: rtsx: Add support for RTS5261")
Fixes: 3df4fce739e2 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG")
Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function")
Fixes: 19f3bd548f27 ("mfd: rtsx: Remove LCTLR defination")
Fixes: 773ccdfd9cc6 ("mfd: rtsx: Read vendor setting from config space")
Fixes: 8275b77a1513 ("mfd: rts5249: Add support for RTS5250S power saving")
Fixes: 5da4e04ae480 ("misc: rtsx: Add support for RTS5260")
Fixes: 0f49bfbd0f2e ("tg3: Use PCI Express Capability accessors")
Fixes: 5e7dfd0fb94a ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ")
Fixes: b726e493e8dc ("r8169: sync existing 8168 device hardware start sequences with vendor driver")
Fixes: e6de30d63eb1 ("r8169: more 8168dp support.")
Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Fixes: 6f461f6c7c96 ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata")
Fixes: 1eae4eb2a1c7 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants")
Fixes: 8060e169e02f ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue")
Fixes: 69ce674bfa69 ("ath9k: do btcoex ASPM disabling at initialization time")
Fixes: f37f05503575 ("mt76: mt76x2e: disable pcie_aspm by default")
Link: https://lore.kernel.org/r/20230717120503.15276-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org>


# c925cfaf 14-Jul-2023 Rob Herring <robh@kernel.org>

PCI: Explicitly include correct DT includes

The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus. As
part of that merge prepping Arm DT support 13 years ago, they "temporarily"
include each other. They also include platform_device.h and of.h. As a
result, there's a pretty much random mix of those include files used
throughout the tree. In order to detangle these headers and replace the
implicit includes with struct declarations, users need to explicitly
include the correct includes.

Link: https://lore.kernel.org/r/20230714174827.4061572-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a89c8224 11-Jun-2023 Maciej W. Rozycki <macro@orcam.me.uk>

PCI: Work around PCIe link training failures

Attempt to handle cases such as with a downstream port of the ASMedia
ASM2824 PCIe switch where link training never completes and the link
continues switching between speeds indefinitely with the data link layer
never reaching the active state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch,
using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433,
wired to a SiFive HiFive Unmatched board. In this setup the switches
should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if
necessary.

Instead the link continues oscillating between the two speeds, at the rate
of 34-35 times per second, with link training reported repeatedly active
~84% of the time. Limiting the target link speed to 2.5GT/s with the
upstream ASM2824 device makes the two switches communicate correctly.
Removing the speed restriction afterwards makes the two devices switch to
5.0GT/s then.

Make use of these observations and detect the inability to train the link
by checking for the Data Link Layer Link Active status bit being off while
the Link Bandwidth Management Status indicating that hardware has changed
the link speed or width in an attempt to correct unreliable link operation.

Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this is
successful, lift the restriction, letting the devices negotiate a higher
speed.

Also check for a 2.5GT/s speed restriction the firmware may have already
arranged and lift it too with ports of devices known to continue working
afterwards (currently only ASM2824), that already report their data link
being up.

[bhelgaas: reorder and squash stubs from
https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk
to avoid adding stubs that do nothing]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 42adbdc7 11-Jun-2023 Maciej W. Rozycki <macro@orcam.me.uk>

PCI: Initialize dev->link_active_reporting earlier

Determine whether Data Link Layer Link Active Reporting is available before
calling any fixups so that the cached value can be used there and later on.

[bhelgaas: move to set_pcie_port_type() where other PCIe init is done]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310122210.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e5422327 25-May-2023 Ross Lagerwall <ross.lagerwall@citrix.com>

PCI: Release resource invalidated by coalescing

When contiguous windows are coalesced by pci_register_host_bridge(), the
second resource is expanded to include the first, and the first is
invalidated and consequently not added to the bus. However, it remains in
the resource hierarchy. For example, these windows:

fec00000-fec7ffff : PCI Bus 0000:00
fec80000-fecbffff : PCI Bus 0000:00

are coalesced into this, where the first resource remains in the tree with
start/end zeroed out:

00000000-00000000 : PCI Bus 0000:00
fec00000-fecbffff : PCI Bus 0000:00

In some cases (e.g. the Xen scratch region), this causes future calls to
allocate_resource() to choose an inappropriate location which the caller
cannot handle.

Fix by releasing the zeroed-out resource and removing it from the resource
hierarchy.

[bhelgaas: commit log]
Fixes: 7c3855c423b1 ("PCI: Coalesce host bridge contiguous apertures")
Link: https://lore.kernel.org/r/20230525153248.712779-1-ross.lagerwall@citrix.com
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.16+


# ac048403 11-Mar-2023 Lukas Wunner <lukas@wunner.de>

PCI/DOE: Create mailboxes on device enumeration

Currently a DOE instance cannot be shared by multiple drivers because
each driver creates its own pci_doe_mb struct for a given DOE instance.
For the same reason a DOE instance cannot be shared between the PCI core
and a driver.

Moreover, finding out which protocols a DOE instance supports requires
creating a pci_doe_mb for it. If a device has multiple DOE instances,
a driver looking for a specific protocol may need to create a pci_doe_mb
for each of the device's DOE instances and then destroy those which
do not support the desired protocol. That's obviously an inefficient
way to do things.

Overcome these issues by creating mailboxes in the PCI core on device
enumeration.

Provide a pci_find_doe_mailbox() API call to allow drivers to get a
pci_doe_mb for a given (pci_dev, vendor, protocol) triple. This API is
modeled after pci_find_capability() and can later be amended with a
pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
given pci_dev which support a specific protocol.

On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
is unbound. This allows drivers to use DOE in their ->remove() hook.

On surprise removal, cancel ongoing DOE exchanges and prevent new ones
from being scheduled. Thereby ensure that a hot-removed device doesn't
needlessly wait for a running exchange to time out.

Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/40a6f973f72ef283d79dd55e7e6fddc7481199af.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 02992064 04-Apr-2023 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI: Make pci_bus_for_each_resource() index optional

Refactor pci_bus_for_each_resource() in the same way as
pci_dev_for_each_resource(). This allows the index to be hidden inside the
implementation so the caller can omit it when it's not used otherwise.

No functional changes intended.

Link: https://lore.kernel.org/r/20230330162434.35055-6-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>


# 0d21e71a 19-Apr-2023 Rob Herring <robh@kernel.org>

PCI: Restrict device disabled status check to DT

Commit 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
checked the firmware device status for both DT and ACPI devices. That
caused a regression in some ACPI systems. The exact reason isn't clear.
It's possibly a firmware bug. For now, at least, refactor the check to
be for DT based systems only.

Note that the original implementation leaked a refcount which is now
correctly handled.

[bhelgaas: Per ACPI r6.5, sec 6.3.7, for devices on an enumerable bus, _STA
must return with bit[0] ("device is present") set]

Link: https://lore.kernel.org/all/m2fs9lgndw.fsf@gmail.com/
Fixes: 6fffbc7ae137 ("PCI: Honor firmware's device disabled status")
Link: https://lore.kernel.org/r/20230419193513.708818-1-robh@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217317
Reported-by: Donald Hunter <donald.hunter@gmail.com>
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Donald Hunter <donald.hunter@gmail.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Binbin Zhou <zhoubinbin@loongson.cn>
Cc: Liu Peibao <liupeibao@loongson.cn>
Cc: Huacai Chen <chenhuacai@loongson.cn>


# 589c3357 12-Dec-2022 Ira Weiny <ira.weiny@intel.com>

PCI/CXL: Export native CXL error reporting control

CXL _OSC Error Reporting Control is used by the OS to determine if
Firmware has control of various CXL error reporting capabilities
including the event logs.

Expose the result of negotiating CXL Error Reporting Control in struct
pci_host_bridge for consumption by the CXL drivers.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: linux-pci@vger.kernel.org
Cc: linux-acpi@vger.kernel.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/20221212070627.1372402-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 9d8ba74a 10-Feb-2023 Geert Uytterhoeven <geert+renesas@glider.be>

PCI: Fix dropping valid root bus resources with .end = zero

On r8a7791/koelsch:

kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xc3a34e00 (size 64):
comm "swapper/0", pid 1, jiffies 4294937460 (age 199.080s)
hex dump (first 32 bytes):
b4 5d 81 f0 b4 5d 81 f0 c0 b0 a2 c3 00 00 00 00 .]...]..........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<fe3aa979>] __kmalloc+0xf0/0x140
[<34bd6bc0>] resource_list_create_entry+0x18/0x38
[<767046bc>] pci_add_resource_offset+0x20/0x68
[<b3f3edf2>] devm_of_pci_get_host_bridge_resources.constprop.0+0xb0/0x390

When coalescing two resources for a contiguous aperture, the second
resource is enlarged to cover the full contiguous range, while the first
resource is marked invalid. This invalidation is done by clearing the
flags, start, and end members.

When adding the initial resources to the bus later, invalid resources are
skipped. Unfortunately, the check for an invalid resource considers only
the end member, causing false positives.

E.g. on r8a7791/koelsch, root bus resource 0 ("bus 00") is skipped, and no
longer registered with pci_bus_insert_busn_res() (causing the memory leak),
nor printed:

pci-rcar-gen2 ee090000.pci: host bridge /soc/pci@ee090000 ranges:
pci-rcar-gen2 ee090000.pci: MEM 0x00ee080000..0x00ee08ffff -> 0x00ee080000
pci-rcar-gen2 ee090000.pci: PCI: revision 11
pci-rcar-gen2 ee090000.pci: PCI host bridge to bus 0000:00
-pci_bus 0000:00: root bus resource [bus 00]
pci_bus 0000:00: root bus resource [mem 0xee080000-0xee08ffff]

Fix this by only skipping resources where all of the flags, start, and end
members are zero.

Fixes: 7c3855c423b17f6c ("PCI: Coalesce host bridge contiguous apertures")
Link: https://lore.kernel.org/r/da0fcd5e86c74239be79c7cb03651c0fce31b515.1676036673.git.geert+renesas@glider.be
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com>


# 6fffbc7a 10-Feb-2023 Rob Herring <robh@kernel.org>

PCI: Honor firmware's device disabled status

If a device has a firmware node (DT/ACPI), and the device is marked
disabled, that is currently ignored. Add a check for this condition and
bail out creating the pci_dev.

This assumes the config space for the device can still be accessed because
they already have by this point in order to identify the device.

Link: https://lore.kernel.org/r/20230210164351.2687475-1-robh@kernel.org
Tested-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Liu Peibao <liupeibao@loongson.cn>
Cc: Huacai Chen <chenhuacai@loongson.cn>


# c14f7ccc 14-Jul-2022 Pali Rohár <pali@kernel.org>

PCI: Assign PCI domain IDs by ida_alloc()

Replace assignment of PCI domain IDs from atomic_inc_return() to
ida_alloc().

Use two IDAs, one for static domain allocations (those which are defined in
device tree) and second for dynamic allocations (all other).

During removal of root bus / host bridge, also release the domain ID. The
released ID can be reused again, for example when dynamically loading and
unloading native PCI host bridge drivers.

This change also allows to mix static device tree assignment and dynamic by
kernel as all static allocations are reserved in dynamic pool.

[bhelgaas: set "err" if "bus->domain_nr < 0"]
Link: https://lore.kernel.org/r/20220714184130.5436-1-pali@kernel.org
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 44e98593 07-Nov-2022 Bjorn Helgaas <bhelgaas@google.com>

Revert "PCI: Clear PCI_STATUS when setting up device"

This reverts commit 6cd514e58f12b211d638dbf6f791fa18d854f09c.

Christophe Fergeau reported that 6cd514e58f12 ("PCI: Clear PCI_STATUS when
setting up device") causes boot failures when trying to start linux guests
with Apple's virtualization framework (for example using
https://developer.apple.com/documentation/virtualization/running_linux_in_a_virtual_machine?language=objc)

6cd514e58f12 only solved a cosmetic problem, so revert it to fix the boot
failures.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137803
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a474d3fb 11-Nov-2022 Thomas Gleixner <tglx@linutronix.de>

PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN

What a zoo:

PCI_MSI
select GENERIC_MSI_IRQ

PCI_MSI_IRQ_DOMAIN
def_bool y
depends on PCI_MSI
select GENERIC_MSI_IRQ_DOMAIN

Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects
GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are
just an indirection to PCI_MSI.

Match the reality and just admit that PCI_MSI requires
GENERIC_MSI_IRQ_DOMAIN.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20221111122014.467556921@linutronix.de


# 27829479 26-Sep-2022 Ira Weiny <ira.weiny@intel.com>

PCI: Allow drivers to request exclusive config regions

PCI config space access from user space has traditionally been
unrestricted with writes being an understood risk for device operation.

Unfortunately, device breakage or odd behavior from config writes lacks
indicators that can leave driver writers confused when evaluating
failures. This is especially true with the new PCIe Data Object
Exchange (DOE) mailbox protocol where backdoor shenanigans from user
space through things such as vendor defined protocols may affect device
operation without complete breakage.

A prior proposal restricted read and writes completely.[1] Greg and
Bjorn pointed out that proposal is flawed for a couple of reasons.
First, lspci should always be allowed and should not interfere with any
device operation. Second, setpci is a valuable tool that is sometimes
necessary and it should not be completely restricted.[2] Finally
methods exist for full lock of device access if required.

Even though access should not be restricted it would be nice for driver
writers to be able to flag critical parts of the config space such that
interference from user space can be detected.

Introduce pci_request_config_region_exclusive() to mark exclusive config
regions. Such regions trigger a warning and kernel taint if accessed
via user space.

Create pci_warn_once() to restrict the user from spamming the log.

[1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 58e01160 05-Sep-2022 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Fix typo in pci_scan_child_bus_extend()

Should be 'if' not 'of'. Fix this.

Link: https://lore.kernel.org/r/20220905080232.36087-7-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 17d2d67d 05-Sep-2022 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Fix whitespace and indentation

Drop two empty lines from pci_scan_child_bus_extend() and correct
indentation in pci_bridge_distribute_available_resources() to better
follow the kernel coding style.

No functional impact.

Link: https://lore.kernel.org/r/20220905080232.36087-6-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 49ad31e9 05-Sep-2022 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Pass available buses even if the bridge is already configured

If some part of the PCI topology is already configured (by the boot
firmware) but not all, and it includes hotplug bridges, we may need to
extend the bus resources of those bridges to accommodate any future
hotplugs, in the same way we already do with the normal hotplug case.

Pass the available buses to pci_scan_child_bus_extend() even when the
bridge in question is already configured so the bus allocation code can use
these available buses to extend the possible hotplug bridges below.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000
Link: https://lore.kernel.org/r/20220905080232.36087-3-mika.westerberg@linux.intel.com
Reported-by: Chris Chiu <chris.chiu@canonical.com>
Tested-by: Chris Chiu <chris.chiu@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 8066cc86 05-Sep-2022 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Fix used_buses calculation in pci_scan_child_bus_extend()

pci_scan_bridge_extend() returns the subordinate bus number needed to cover
all the buses below a bridge. pci_scan_child_bus_extend() computes the
number of buses to reserve by comparing that with the current max bus
number. Previously it did the subtraction in the wrong order, so
'used_buses' was nonsense.

Subtract 'max' from 'cmax' as is done for the similar
pci_scan_bridge_extend() call in the following block.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216000
Fixes: 3374c545c27c ("PCI: Account for all bridges on bus when distributing bus numbers")
Link: https://lore.kernel.org/r/20220905080232.36087-2-mika.westerberg@linux.intel.com
Reported-by: Chris Chiu <chris.chiu@canonical.com>
Tested-by: Chris Chiu <chris.chiu@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# b559afd5 19-Jul-2022 Ira Weiny <ira.weiny@intel.com>

PCI: Replace magic constant for PCI Sig Vendor ID

Replace the magic value in pci_bus_crs_vendor_id() with
PCI_VENDOR_ID_PCI_SIG.

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220719205249.566684-3-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 189c6c33 28-Jun-2022 Niklas Schnelle <schnelle@linux.ibm.com>

PCI: Extend isolated function probing to s390

Like the jailhouse hypervisor, s390's PCI architecture allows passing
isolated PCI functions to a guest OS instance. As of now this is was not
utilized even with multi-function support as the s390 PCI code makes sure
that only virtual PCI busses including a function with devfn 0 are
presented to the PCI subsystem. A subsequent change will remove this
restriction.

Allow probing such functions by replacing the existing check for
jailhouse_paravirt() with a new hypervisor_isolated_pci_functions() helper.

Link: https://lore.kernel.org/r/20220628143100.3228092-5-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>


# db360b1e 28-Jun-2022 Niklas Schnelle <schnelle@linux.ibm.com>

PCI: Move jailhouse's isolated function handling to pci_scan_slot()

The special case of the jailhouse hypervisor passing through individual PCI
functions handles scanning for PCI functions even if function 0 does not
exist. Previously this was done with an extra loop duplicating the one in
pci_scan_slot(). By incorporating the check for jailhouse_paravirt() into
pci_scan_slot() we can instead do this as part of the normal slot scan.
Note that with the assignment of dev->multifunction gated by fn > 0 we set
dev->multifunction unconditionally for all functions if function 0 is
missing just as in the existing jailhouse loop.

The only functional change is that we now call pcie_aspm_init_link_state()
for these functions, but this already happened if function 0 was passed
through and should not be a problem.

Link: https://lore.kernel.org/linux-pci/20220408224514.GA353445@bhelgaas/
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20220628143100.3228092-4-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>


# fbed59ed 28-Jun-2022 Niklas Schnelle <schnelle@linux.ibm.com>

PCI: Split out next_ari_fn() from next_fn()

In commit b1bd58e448f2 ("PCI: Consolidate "next-function" functions") the
next_fn() function subsumed the traditional and ARI-based next function
determination. This got rid of some needlessly complex function pointer
handling but also reduced the separation between these very different
methods of finding the next function. With the next_fn() cleaned up a bit
we can re-introduce this separation by moving out the ARI handling while
sticking with direct function calls.

Link: https://lore.kernel.org/r/20220628143100.3228092-3-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>


# c3df83e0 28-Jun-2022 Niklas Schnelle <schnelle@linux.ibm.com>

PCI: Clean up pci_scan_slot()

While determining the next PCI function is factored out of pci_scan_slot()
into next_fn(), the former still handles the first function as a special
case, which duplicates the code from the scan loop.

Furthermore the non-ARI branch of next_fn() is generally hard to understand
and especially the check for multifunction devices is hidden in the
handling of NULL devices for non-contiguous multifunction. It also signals
that no further functions need to be scanned by returning 0 via wraparound
and this is a valid function number.

Improve upon this by transforming the conditions in next_fn() to be easier
to understand.

By changing next_fn() to return -ENODEV instead of 0 when there is no next
function we can then handle the initial function inside the loop and
deduplicate the shared handling. This also makes it more explicit that only
function 0 must exist.

No functional change is intended.

Link: https://lore.kernel.org/r/20220628143100.3228092-2-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>


# 6cd514e5 16-May-2022 Kai-Heng Feng <kai.heng.feng@canonical.com>

PCI: Clear PCI_STATUS when setting up device

We are seeing Master Abort bit is set on Intel I350 ethernet device and its
root port right after boot, probably happened during BIOS phase:

00:06.0 PCI bridge [0604]: Intel Corporation Device [8086:464d] (rev 05) (prog-if 00 [Normal decode])
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-

6e:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-

The Master Abort bit is cleared after S3.

Since there's no functional impact found, clear the PCI_STATUS to treat it
anew at setting up.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215989
Link: https://lore.kernel.org/r/20220517043738.2308499-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 661c4c4f 07-Dec-2021 Sergio Paracuellos <sergio.paracuellos@gmail.com>

PCI: Let pcibios_root_bridge_prepare() access bridge->windows

When pci_register_host_bridge() is called, bridge->windows are already
available. However these windows are being moved temporarily from there.

To let pcibios_root_bridge_prepare() have access to these windows, move the
windows movement after calling this function. This is useful for the MIPS
ralink mt7621 platform so it can set up I/O coherence units and avoid
custom MIPS code in the mt7621 PCIe controller driver.

Link: https://lore.kernel.org/r/20211207104924.21327-2-sergio.paracuellos@gmail.com
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>


# fa52b644 18-Nov-2021 Naveen Naidu <naveennaidu479@gmail.com>

PCI/ERR: Use PCI_POSSIBLE_ERROR() to check config reads

When config pci_ops.read() can detect failed PCI transactions, the data
returned to the CPU is PCI_ERROR_RESPONSE (~0 or 0xffffffff).

Obviously a successful PCI config read may *also* return that data if a
config register happens to contain ~0, so it doesn't definitively indicate
an error unless we know the register cannot contain ~0.

Use PCI_POSSIBLE_ERROR() to check the response we get when we read data
from hardware. This unifies PCI error response checking and makes error
checks consistent and easier to find.

Link: https://lore.kernel.org/r/f4d18d470cb90f9cb52ea155b01528ba2e76e8d6.1637243717.git.naveennaidu479@gmail.com
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d2c64f98 15-Nov-2021 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI: Use pci_find_vsec_capability() when looking for TBT devices

Currently set_pcie_thunderbolt() open-codes pci_find_vsec_capability().
Refactor the former to use the latter. No functional change intended.

Link: https://lore.kernel.org/r/20211115112902.24033-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>


# cd119b09 06-Dec-2021 Thomas Gleixner <tglx@linutronix.de>

PCI/MSI: Move msi_lock to struct pci_dev

It's only required for PCI/MSI. So no point in having it in every struct
device.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211206210224.925241961@linutronix.de


# fd1ae23b 12-Oct-2021 Krzysztof Wilczyński <kw@linux.com>

PCI: Prefer 'unsigned int' over bare 'unsigned'

The bare "unsigned" type implicitly means "unsigned int", but the preferred
coding style is to use the complete type name.

Update the bare use of "unsigned" to the preferred "unsigned int".

No change to functionality intended.

See a1ce18e4f941 ("checkpatch: warn on bare unsigned or signed declarations
without int").

Link: https://lore.kernel.org/r/20211013014136.1117543-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e1b0d0bb 12-Oct-2021 Mingchuang Qiao <mingchuang.qiao@mediatek.com>

PCI: Re-enable Downstream Port LTR after reset or hotplug

Per PCIe r5.0, sec 7.5.3.16, Downstream Ports must disable LTR if the link
goes down (the Port goes DL_Down status). This is a problem because the
Downstream Port's dev->ltr_path is still set, so we think LTR is still
enabled, and we enable LTR in the Endpoint. When it sends LTR messages,
they cause Unsupported Request errors at the Downstream Port.

This happens in the reset path, where we may enable LTR in
pci_restore_pcie_state() even though the Downstream Port disabled LTR
because the reset caused a link down event.

It also happens in the hot-remove and hot-add path, where we may enable LTR
in pci_configure_ltr() even though the Downstream Port disabled LTR when
the hot-remove took the link down.

In these two scenarios, check the upstream bridge and restore its LTR
enable if appropriate.

The Unsupported Request may be logged by AER as follows:

pcieport 0000:00:1d.0: AER: Uncorrected (Non-Fatal) error received: id=00e8
pcieport 0000:00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=00e8(Requester ID)
pcieport 0000:00:1d.0: device [8086:9d18] error status/mask=00100000/00010000
pcieport 0000:00:1d.0: [20] Unsupported Request (First)

In addition, if LTR is not configured correctly, the link cannot enter the
L1.2 state, which prevents some machines from entering the S0ix low power
state.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20211012075614.54576-1-mingchuang.qiao@mediatek.com
Reported-by: Utkarsh H Patel <utkarsh.h.patel@intel.com>
Signed-off-by: Mingchuang Qiao <mingchuang.qiao@mediatek.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>


# 7c3855c4 13-Jul-2021 Kai-Heng Feng <kai.heng.feng@canonical.com>

PCI: Coalesce host bridge contiguous apertures

Built-in graphics at 07:00.0 on HP EliteDesk 805 G6 doesn't work because
graphics can't get the BAR it needs. The BIOS configuration is
correct: BARs 0 and 2 both fit in the 00:08.1 bridge window.

But that 00:08.1 window covers two host bridge apertures from _CRS.
Previously we assumed this was illegal, so we clipped the window to fit
into one aperture (see 0f7e7aee2f37 ("PCI: Add pci_bus_clip_resource() to
clip to fit upstream window")).

pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]

pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100401fffff 64bit pref]
pci 0000:07:00.0: reg 0x10: [mem 0x10030000000-0x1003fffffff 64bit pref]
pci 0000:07:00.0: reg 0x18: [mem 0x10040000000-0x100401fffff 64bit pref]

pci 0000:00:08.1: can't claim BAR 15 [mem 0x10030000000-0x100401fffff 64bit pref]: no compatible bridge window
pci 0000:00:08.1: [mem 0x10030000000-0x100401fffff 64bit pref] clipped to [mem 0x10030000000-0x100303fffff 64bit pref]
pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100303fffff 64bit pref]

pci 0000:07:00.0: can't claim BAR 0 [mem 0x10030000000-0x1003fffffff 64bit pref]: no compatible bridge window
pci 0000:07:00.0: can't claim BAR 2 [mem 0x10040000000-0x100401fffff 64bit pref]: no compatible bridge window

However, the host bridge apertures are contiguous, so there's no need to
clip in this case. Coalesce contiguous apertures so we can allocate from
the entire contiguous region.

Previous commit 65db04053efe ("PCI: Coalesce host bridge contiguous
apertures") was similar but sorted the apertures, and Guenter Roeck
reported a regression in ppc:sam460ex qemu emulation from nvme; see
https://lore.kernel.org/all/20210709231529.GA3270116@roeck-us.net/

[bhelgaas: commit log]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212013
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20210713125007.1260304-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Guenter Roeck <linux@roeck-us.net>


# 06dc660e 13-Sep-2021 Oliver O'Halloran <oohall@gmail.com>

PCI: Rename pcibios_add_device() to pcibios_device_add()

The general convention for pcibios_* hooks is that they're named after the
corresponding pci_* function they provide a hook for. The exception is
pcibios_add_device() which provides a hook for pci_device_add().

Rename pcibios_add_device() to pcibios_device_add() so it matches
pci_device_add().

Also, remove the export of the microblaze version. The only caller must be
compiled as a built-in so there's no reason for the export.

Link: https://lore.kernel.org/r/20210913152709.48013-1-oohall@gmail.com
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> # s390


# 41dd40fd 26-Jul-2021 Boqun Feng <boqun.feng@gmail.com>

PCI: Support populating MSI domains of root buses via bridges

Currently, at probing time, the MSI domains of root buses are populated
if either the information of MSI domain is available from firmware (DT
or ACPI), or arch-specific sysdata is used to pass the fwnode of the MSI
domain. These two conditions don't cover all, e.g. Hyper-V virtual PCI
on ARM64, which doesn't have the MSI information in the firmware and
couldn't use arch-specific sysdata because running on an architecture
with PCI_DOMAINS_GENERIC=y.

To support populating MSI domains of the root buses at the probing when
neither of the above condition is true, the ->msi_domain of the
corresponding bridge device is used: in pci_host_bridge_msi_domain(),
which should return the MSI domain of the root bus, the ->msi_domain of
the corresponding bridge is fetched first as a potential value of the
MSI domain of the root bus.

In order to use the approach to populate MSI domains, the driver needs
to dev_set_msi_domain() on the bridge before calling
pci_register_host_bridge(), and makes sure GENERIC_MSI_IRQ_DOMAIN=y.

Another advantage of this new approach is providing an arch-independent
way to populate MSI domains, which allows sharing the driver code as
much as possible between architectures.

Originally-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210726180657.142727-3-boqun.feng@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>


# 15d82ca2 26-Jul-2021 Boqun Feng <boqun.feng@gmail.com>

PCI: Introduce domain_nr in pci_host_bridge

Currently we retrieve the PCI domain number of the host bridge from the
bus sysdata (or pci_config_window if PCI_DOMAINS_GENERIC=y). Actually
we have the information at PCI host bridge probing time, and it makes
sense that we store it into pci_host_bridge. One benefit of doing so is
the requirement for supporting PCI on Hyper-V for ARM64, because the
host bridge of Hyper-V doesn't have pci_config_window, whereas ARM64 is
a PCI_DOMAINS_GENERIC=y arch, so we cannot retrieve the PCI domain
number from pci_config_window on ARM64 Hyper-V guest.

As the preparation for ARM64 Hyper-V PCI support, we introduce the
domain_nr in pci_host_bridge and a sentinel value to allow drivers to
set domain numbers properly at probing time. Currently
CONFIG_PCI_DOMAINS_GENERIC=y archs are only users of this
newly-introduced field.

Link: https://lore.kernel.org/r/20210726180657.142727-2-boqun.feng@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>


# 375553a9 17-Aug-2021 Shanker Donthineni <sdonthineni@nvidia.com>

PCI: Setup ACPI fwnode early and at the same time with OF

Previously, the ACPI_COMPANION() of a pci_dev was usually set by
acpi_bind_one() in this path:

pci_device_add
pci_configure_device
pci_init_capabilities
device_add
device_platform_notify
acpi_platform_notify
acpi_device_notify # KOBJ_ADD
acpi_bind_one
ACPI_COMPANION_SET

However, things like pci_configure_device() and pci_init_capabilities()
that run before device_add() need the ACPI_COMPANION, e.g.,
acpi_pci_bridge_d3() uses a _DSD method to learn about D3 support. These
places had special-case code to manually look up the ACPI_COMPANION.

Set the ACPI_COMPANION earlier, in pci_setup_device(), so it will be
available while configuring the device. This covers both paths to creating
pci_dev objects:

pci_scan_single_device # for normal non-SR-IOV devices
pci_scan_device
pci_setup_device
pci_set_acpi_fwnode
pci_device_add

pci_iov_add_virtfn # for SR-IOV virtual functions
pci_setup_device
pci_set_acpi_fwnode

Also move the OF fwnode setup to the same spot.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20210817180500.1253-8-ameynarkhede03@gmail.com
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>


# 4ec36dfe 17-Aug-2021 Amey Narkhede <ameynarkhede03@gmail.com>

PCI: Remove reset_fn field from pci_dev

"reset_fn" indicates whether the device supports any reset mechanism.
Remove the use of reset_fn in favor of the reset_methods array that tracks
supported reset mechanisms of a device and their ordering.

The octeon driver incorrectly used reset_fn to detect whether the device
supports FLR or not. Use pcie_reset_flr() to probe whether it supports FLR.

Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-5-ameynarkhede03@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>


# e20afa06 17-Aug-2021 Amey Narkhede <ameynarkhede03@gmail.com>

PCI: Add array to track reset method ordering

Add reset_methods[] in struct pci_dev to keep track of reset mechanisms
supported by the device and their ordering.

Refactor probing and reset functions to take advantage of calling
convention of reset functions.

Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-4-ameynarkhede03@gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>


# 69139244 17-Aug-2021 Amey Narkhede <ameynarkhede03@gmail.com>

PCI: Cache PCIe Device Capabilities register

Add a new member called devcap in struct pci_dev for caching the PCIe
Device Capabilities register to avoid reading PCI_EXP_DEVCAP multiple
times.

Refactor pcie_has_flr() to use cached device capabilities.

Link: https://lore.kernel.org/r/20210817180500.1253-2-ameynarkhede03@gmail.com
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>


# fd00faa3 08-Aug-2021 Heiner Kallweit <hkallweit1@gmail.com>

PCI/VPD: Embed struct pci_vpd in struct pci_dev

Now that struct pci_vpd is really small, simplify the code by embedding
struct pci_vpd directly in struct pci_dev instead of dynamically allocating
it.

Link: https://lore.kernel.org/r/d898489e-22ba-71f1-2f31-f1a78dc15849@gmail.com
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 62efe3ee 09-Jul-2021 Bjorn Helgaas <bhelgaas@google.com>

Revert "PCI: Coalesce host bridge contiguous apertures"

This reverts commit 65db04053efea3f3e412a7e0cc599962999c96b4.

Guenter reported that after 65db04053efe, the ppc:sam460ex qemu emulation
no longer boots from nvme:

nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0
nvme nvme0: Removing after probe failure status: -19

Link: https://lore.kernel.org/r/20210709231529.GA3270116@roeck-us.net
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 65db0405 01-Apr-2021 Kai-Heng Feng <kai.heng.feng@canonical.com>

PCI: Coalesce host bridge contiguous apertures

Built-in graphics on HP EliteDesk 805 G6 doesn't work because graphics
can't get the BAR it needs:

pci_bus 0000:00: root bus resource [mem 0x10020200000-0x100303fffff window]
pci_bus 0000:00: root bus resource [mem 0x10030400000-0x100401fffff window]

pci 0000:00:08.1: bridge window [mem 0xd2000000-0xd23fffff]
pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100401fffff 64bit pref]
pci 0000:00:08.1: can't claim BAR 15 [mem 0x10030000000-0x100401fffff 64bit pref]: no compatible bridge window
pci 0000:00:08.1: [mem 0x10030000000-0x100401fffff 64bit pref] clipped to [mem 0x10030000000-0x100303fffff 64bit pref]
pci 0000:00:08.1: bridge window [mem 0x10030000000-0x100303fffff 64bit pref]
pci 0000:07:00.0: can't claim BAR 0 [mem 0x10030000000-0x1003fffffff 64bit pref]: no compatible bridge window
pci 0000:07:00.0: can't claim BAR 2 [mem 0x10040000000-0x100401fffff 64bit pref]: no compatible bridge window

However, the root bus has two contiguous apertures that can contain the
child resource requested.

Coalesce contiguous apertures so we can allocate from the entire contiguous
region.

[bhelgaas: fold in https://lore.kernel.org/r/20210528170242.1564038-1-kai.heng.feng@canonical.com]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212013
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20210401131252.531935-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ea4aae05 11-Mar-2021 Niklas Schnelle <schnelle@linux.ibm.com>

PCI: Print a debug message on PCI device release

Commit 62795041418d ("PCI: enhance physical slot debug information") added
a debug print on releasing the PCI slot and another message on destroying
it. There is however no debug print on releasing the PCI device structure
itself and even with closely looking at the kernel log during hotplug
testing, I overlooked several missing pci_dev_put() calls for way too long.

Add a debug print in pci_release_dev() making it much easier to spot when
the PCI device structure is not released when it is supposed to be.

Link: https://lore.kernel.org/r/20210311132312.2882425-1-schnelle@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c037b6c8 24-May-2021 Rajat Jain <rajatja@google.com>

PCI: Add sysfs "removable" attribute

A PCI device is "external_facing" if it's a Root Port with the ACPI
"ExternalFacingPort" property or if it has the DT "external-facing"
property. We consider everything downstream from such a device to
be removable by user.

We're mainly concerned with consumer platforms with user accessible
Thunderbolt ports that are vulnerable to DMA attacks, and we expect those
ports to be identified by firmware as "ExternalFacingPort". Devices in
traditional hotplug slots can technically be removed, but the expectation
is that unless the port is marked with "ExternalFacingPort", such devices
are less accessible to user / may not be removed by end user, and thus not
exposed as "removable" to userspace.

This can be used to implement userspace policies tailored for
user removable devices. Eg usage:
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/2591812
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/2795038
(code uses such an attribute to remove external PCI devices or disable
features on them as needed by the policy desired)

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rajat Jain <rajatja@google.com>
Link: https://lore.kernel.org/r/20210524171812.18095-2-rajatja@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 85aabbd7 10-May-2021 Jean-Philippe Brucker <jean-philippe@linaro.org>

PCI/MSI: Fix MSIs for generic hosts that use device-tree's "msi-map"

Since commit 9ec37efb8783 ("PCI/MSI: Make pci_host_common_probe() declare
its reliance on MSI domains"), platforms that rely on the "msi-map"
device-tree property don't get MSIs anymore.

On the Arm Fast Model for example [1], the host bridge doesn't have a
"msi-parent" property since it doesn't itself generate MSIs, and so doesn't
get a MSI domain. It has an "msi-map" property instead to describe MSI
controllers of child devices. As a result, due to the new msi_domain check
in pci_register_host_bridge(), the whole bus gets PCI_BUS_FLAGS_NO_MSI.

Check whether the root complex has an "msi-map" property before giving
up on MSIs.

[1] arch/arm64/boot/dts/arm/fvp-base-revc.dts

Fixes: 9ec37efb8783 ("PCI/MSI: Make pci_host_common_probe() declare its reliance on MSI domains")
Link: https://lore.kernel.org/r/20210510173129.750496-1-jean-philippe@linaro.org
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>


# 94e89b14 30-Mar-2021 Marc Zyngier <maz@kernel.org>

PCI/MSI: Let PCI host bridges declare their reliance on MSI domains

There is a whole class of host bridges that cannot know whether
MSIs will be provided or not, as they rely on other blocks
to provide the MSI functionnality, using MSI domains. This is
the case for example on systems that use the ARM GIC architecture.

Introduce a new attribute ('msi_domain') indicating that implicit
dependency, and use this property to set the NO_MSI flag when
no MSI domain is found at probe time.

Link: https://lore.kernel.org/r/20210330151145.997953-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>


# 3a05d08f 30-Mar-2021 Marc Zyngier <maz@kernel.org>

PCI/MSI: Drop use of msi_controller from core code

As there is no driver using msi_controller, we can now safely
remove its use from the PCI probe code.

Link: https://lore.kernel.org/r/20210330151145.997953-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>


# c99e755a 24-Jan-2021 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

PCI: Release OF node in pci_scan_device()'s error path

In pci_scan_device(), if pci_setup_device() fails for any reason, the code
will not release device's of_node by calling pci_release_of_node(). Fix
that by calling the release function.

Fixes: 98d9f30c820d ("pci/of: Match PCI devices to OF nodes dynamically")
Link: https://lore.kernel.org/r/20210124232826.1879-1-dmitry.baryshkov@linaro.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>


# cbc40d5c 02-Dec-2020 Bjorn Helgaas <bhelgaas@google.com>

PCI/MSI: Move MSI/MSI-X init to msi.c

Move pci_msi_setup_pci_dev(), which disables MSI and MSI-X interrupts, from
probe.c to msi.c so it's with all the other MSI code and more consistent
with other capability initialization. This means we must compile msi.c
always, even without CONFIG_PCI_MSI, so wrap the rest of msi.c in an #ifdef
and adjust the Makefile accordingly. No functional change intended.

Link: https://lore.kernel.org/r/20201203185110.1583077-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>


# 90655631 20-Nov-2020 Sean V Kelley <sean.v.kelley@intel.com>

PCI/ERR: Cache RCEC EA Capability offset in pci_init_capabilities()

Extend support for Root Complex Event Collectors by decoding and caching
the RCEC Endpoint Association Extended Capabilities when enumerating. Use
that cached information for later error source reporting. See PCIe r5.0,
sec 7.9.10.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20201121001036.8560-4-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


# 2f0cd59c 23-Oct-2020 Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

PCI: Fix kernel-doc markup

Update kernel-doc so the names in the doc match the prototypes.

Link: https://lore.kernel.org/r/f19caf7a68f8365c8b573a42b4ac89ec21925c73.1603469755.git.mchehab+huawei@kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 34191749 18-Nov-2020 Gustavo Pimentel <Gustavo.Pimentel@synopsys.com>

PCI: Decode PCIe 64 GT/s link speed

PCIe r6.0, sec 7.5.3.18, defines a new 64.0 GT/s bit in the Supported Link
Speeds Vector of Link Capabilities 2.

This patch does not affect the speed of the link, which should be
negotiated automatically by the hardware; it only adds decoding when
showing the speed to the user.

Decode this new speed. Previously, reading the speed of a link operating
at this speed showed "Unknown speed" instead of "64.0 GT/s".

Link: https://lore.kernel.org/r/aaaab33fe18975e123a84aebce2adb85f44e2bbe.1605739760.git.gustavo.pimentel@synopsys.com
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>


# ecdf57b4 15-Oct-2020 Saheed O. Bolarinwa <refactormyself@gmail.com>

PCI/ASPM: Remove struct aspm_register_info.l1ss_cap_ptr

Save the L1 Substates Capability pointer in struct pci_dev. Then we don't
have to keep track of it in the struct aspm_register_info and struct
pcie_link_state, which makes the code easier to read. No functional change
intended.

[bhelgaas: split to a separate patch]
Link: https://lore.kernel.org/r/20201015193039.12585-8-helgaas@kernel.org
Signed-off-by: Saheed O. Bolarinwa <refactormyself@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6e8e104d 20-Aug-2020 Rob Herring <robh@kernel.org>

PCI: Also call .add_bus() callback for root bus

Similar to pcibios_add_bus(), call pci_ops.add_bus() when the root bus
is added. This allows host bridge drivers to do any setup requiring a
bus pointer.

There are currently no .add_bus() callbacks, so this is safe to do.

Link: https://lore.kernel.org/r/20200821035420.380495-15-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>


# 07e29295 20-Aug-2020 Rob Herring <robh@kernel.org>

PCI: Allow root and child buses to have different pci_ops

PCI host bridges often have different ways to access the root and child
bus config spaces. The host bridge drivers have invented their own
abstractions to handle this. Let's support having different root and
child bus pci_ops so these per driver abstractions can be removed.

Link: https://lore.kernel.org/r/20200821035420.380495-2-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>


# 669cbc70 21-Jul-2020 Rob Herring <robh@kernel.org>

PCI: Move DT resource setup into devm_pci_alloc_host_bridge()

Now that pci_parse_request_of_pci_ranges() callers just setup
pci_host_bridge.windows and dma_ranges directly and don't need the bus
range returned, we can just initialize them when allocating the
pci_host_bridge struct.

With this, pci_parse_request_of_pci_ranges() becomes a static function.

Link: https://lore.kernel.org/r/20200722022514.1283916-19-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>


# b7360f60 27-Jul-2020 Tiezhu Yang <yangtiezhu@loongson.cn>

PCI: Announce device after early fixups

Announce the device, e.g.,

pci 0000:00:00.0: [8086:5910] type 00 class 0x060000

after running early fixups, so the log message reflects any device type or
class code fixups.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/1595833615-8049-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 4f5c883d 21-Jul-2020 Rob Herring <robh@kernel.org>

PCI: Move setting pci_host_bridge.busnr out of host drivers

Most host drivers only parse the DT bus range to set the root bus number
in pci_host_bridge.busnr. The ones that don't set busnr are buggy in
that they ignore what's in DT. Let's set busnr in pci_scan_root_bus_bridge()
where we already check for the bus resource and remove setting it in
host drivers.

Link: https://lore.kernel.org/r/20200722022514.1283916-12-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jingoo Han <jingoohan1@gmail.com>
Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Will Deacon <will@kernel.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Marek Vasut <marek.vasut+renesas@gmail.com>
Cc: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Cc: linux-tegra@vger.kernel.org
Cc: linux-mediatek@lists.infradead.org
Cc: linux-renesas-soc@vger.kernel.org


# 6a589900 21-Jul-2020 Rob Herring <robh@kernel.org>

PCI: Set default bridge parent device

The host bridge's parent device is always the platform device. As we
already have a pointer to it in the devres functions, let's initialize
the parent device. Drivers can still override the parent if desired.

Link: https://lore.kernel.org/r/20200722022514.1283916-3-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>


# 99b50be9 07-Jul-2020 Rajat Jain <rajatja@google.com>

PCI: Treat "external-facing" devices themselves as internal

"External-facing" devices are internal devices that expose PCIe hierarchies
such as Thunderbolt outside the platform [1]. Previously these internal
devices were marked as "untrusted" the same as devices downstream from
them.

Use the ACPI or DT information to identify external-facing devices, but
only mark the devices *downstream* from them as "untrusted" [2]. The
external-facing device itself is no longer marked as untrusted.

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports
[2] https://lore.kernel.org/linux-pci/20200610230906.GA1528594@bjorn-Precision-5520/
Link: https://lore.kernel.org/r/20200707224604.3737893-3-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 52fbf5bd 07-Jul-2020 Rajat Jain <rajatja@google.com>

PCI: Cache ACS capability offset in device

Currently the ACS capability is being looked up at a number of places. Read
and store it once at enumeration so that it can be used by all later. No
functional change intended.

Link: https://lore.kernel.org/r/20200707224604.3737893-2-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b6caa1d8 26-May-2020 Jiaxun Yang <jiaxun.yang@flygoat.com>

PCI: Don't disable decoding when mmio_always_on is set

Don't disable MEM/IO decoding when a device have both non_compliant_bars
and mmio_always_on.

That would allow us quirk devices with junk in BARs but can't disable
their decoding.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Acked-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>


# aa0ce96d 27-Mar-2020 Ashok Raj <ashok.raj@intel.com>

PCI: Program MPS for RCiEP devices

Root Complex Integrated Endpoints (RCiEPs) do not have an upstream bridge,
so pci_configure_mps() previously ignored them, which may result in reduced
performance.

Instead, program the Max_Payload_Size of RCiEPs to the maximum supported
value (unless it is limited for the PCIE_BUS_PEER2PEER case). This also
affects the subsequent programming of Max_Read_Request_Size because Linux
programs MRRS based on the MPS value.

Fixes: 9dae3a97297f ("PCI: Move MPS configuration check to pci_configure_device()")
Link: https://lore.kernel.org/r/1585343775-4019-1-git-send-email-ashok.raj@intel.com
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org


# 9885440b 13-May-2020 Rob Herring <robh@kernel.org>

PCI: Fix pci_host_bridge struct device release/free handling

The PCI code has several paths where the struct pci_host_bridge is freed
directly. This is wrong because it contains a struct device which is
refcounted and should be freed using put_device(). This can result in
use-after-free errors. I think this problem has existed since 2012 with
commit 7b5436635800 ("PCI: add generic device into pci_host_bridge
struct"). It generally hasn't mattered as most host bridge drivers are
still built-in and can't unbind.

The problem is a struct device should never be freed directly once
device_initialize() is called and a ref is held, but that doesn't happen
until pci_register_host_bridge(). There's then a window between allocating
the host bridge and pci_register_host_bridge() where kfree should be used.
This is fragile and requires callers to do the right thing. To fix this, we
need to split device_register() into device_initialize() and device_add()
calls, so that the host bridge struct is always freed by using a
put_device().

devm_pci_alloc_host_bridge() is using devm_kzalloc() to allocate struct
pci_host_bridge which will be freed directly. Instead, we can use a custom
devres action to call put_device().

Link: https://lore.kernel.org/r/20200513223859.11295-2-robh@kernel.org
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>


# 1b54ae83 13-May-2020 Rob Herring <robh@kernel.org>

PCI: Fix pci_register_host_bridge() device_register() error handling

If device_register() has an error, we should bail out of
pci_register_host_bridge() rather than continuing on.

Fixes: 37d6a0a6f470 ("PCI: Add pci_register_host_bridge() interface")
Link: https://lore.kernel.org/r/20200513223859.11295-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>


# 6ae72bfa 09-May-2020 Yicong Yang <yangyicong@hisilicon.com>

PCI: Unify pcie_find_root_port() and pci_find_pcie_root_port()

Previously we used pcie_find_root_port() to find a Root Port from a PCIe
device and pci_find_pcie_root_port() to find a Root Port from a
Conventional PCI device.

Unify the two functions and use pcie_find_root_port() to find a Root Port
from either a Conventional PCI device or a PCIe device. Then there is no
need to distinguish the type of the device.

Link: https://lore.kernel.org/r/1589019568-5216-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # thunderbolt


# ac1c8e35 23-Mar-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/DPC: Add Error Disconnect Recover (EDR) support

Error Disconnect Recover (EDR) is a feature that allows ACPI firmware to
notify OSPM that a device has been disconnected due to an error condition
(ACPI v6.3, sec 5.6.6). OSPM advertises its support for EDR on PCI devices
via _OSC (see [1], sec 4.5.1, table 4-4). The OSPM EDR notify handler
should invalidate software state associated with disconnected devices and
may attempt to recover them. OSPM communicates the status of recovery to
the firmware via _OST (sec 6.3.5.2).

For PCIe, firmware may use Downstream Port Containment (DPC) to support
EDR. Per [1], sec 4.5.1, table 4-6, even if firmware has retained control
of DPC, OSPM may read/write DPC control and status registers during the EDR
notification processing window, i.e., from the time it receives an EDR
notification until it clears the DPC Trigger Status.

Note that per [1], sec 4.5.1 and 4.5.2.4,

1. If the OS supports EDR, it should advertise that to firmware by
setting OSC_PCI_EDR_SUPPORT in _OSC Support.

2. If the OS sets OSC_PCI_EXPRESS_DPC_CONTROL in _OSC Control to request
control of the DPC capability, it must also set OSC_PCI_EDR_SUPPORT in
_OSC Support.

Add an EDR notify handler to attempt recovery.

[1] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888

[bhelgaas: squash add/enable patches into one]
Link: https://lore.kernel.org/r/90f91fe6d25c13f9d2255d2ce97ca15be307e1bb.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>


# 27005618 23-Mar-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/DPC: Cache DPC capabilities in pci_init_capabilities()

Since Error Disconnect Recover needs to use DPC error handling routines
even if the OS doesn't have control of DPC, move the initalization and
caching of DPC capabilities from the DPC driver to pci_init_capabilities().

Link: https://lore.kernel.org/r/5888380657c8b9551675b5dbd48e370e4fd2703d.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e56faff5 28-Feb-2020 Bjorn Helgaas <bhelgaas@google.com>

PCI: Add pci_speed_string()

Add pci_speed_string() to return a text description of the supplied bus or
link speed. The slot code previously used the private
pci_bus_speed_strings[] array for this purpose, but adding this interface
will enable us to consolidate similar code elsewhere.

Export pcie_link_speed[] and pci_speed_string() so they can be used by
modules.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# bbd8810d 03-Sep-2019 Krzysztof Wilczynski <kw@linux.com>

PCI: Remove unused includes and superfluous struct declaration

Remove <linux/pci.h> and <linux/msi.h> from being included directly as part
of the include/linux/of_pci.h, and remove superfluous declaration of struct
of_phandle_args.

Move users of include <linux/of_pci.h> to include <linux/pci.h> and
<linux/msi.h> directly rather than rely on both being included transitively
through <linux/of_pci.h>.

Link: https://lore.kernel.org/r/20190903113059.2901-1-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rob Herring <robh@kernel.org>


# 73884a70 03-Nov-2019 Subbaraya Sundeep <sbhatta@marvell.com>

PCI: Do not use bus number zero from EA capability

As per PCIe r5.0, sec 7.8.5.2, fixed bus numbers of a bridge must be zero
when no function that uses EA is located behind it. Hence, if EA supplies
bus numbers of zero, assign bus numbers normally. A secondary bus can
never have a bus number of zero, so setting a bridge's Secondary Bus Number
to zero makes downstream devices unreachable.

[bhelgaas: retain bool return value so "zero is invalid" logic is local]
Fixes: 2dbce5901179 ("PCI: Assign bus numbers present in EA capability for bridges")
Link: https://lore.kernel.org/r/1572850664-9861-1-git-send-email-sundeep.lkml@gmail.com
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v5.2+


# ad508610 19-Oct-2019 Yunsheng Lin <linyunsheng@huawei.com>

PCI: Warn if no host bridge NUMA node info

In pci_call_probe(), we try to run driver probe functions on the node where
the device is attached. If we don't know which node the device is attached
to, the driver will likely run on the wrong node. This will still work,
but performance will not be as good as it could be.

On NUMA systems, warn if we don't know which node a PCI host bridge is
attached to. This is likely an indication that ACPI didn't supply a _PXM
method or the DT didn't supply a "numa-node-id" property.

[bhelgaas: commit log, check bus node]
Link: https://lore.kernel.org/r/1571467543-26125-1-git-send-email-linyunsheng@huawei.com
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9d8b738b 03-Oct-2019 Bjorn Helgaas <bhelgaas@google.com>

PCI: Remove useless comments and tidy others

Remove useless comments and tidy others for better readability. Whitespace
changes only.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 751035b8 05-Sep-2019 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/ATS: Cache PASID Capability offset

Previously each PASID interface searched for the PASID Capability. Cache
the capability offset the first time we use it instead of searching each
time.

[bhelgaas: commit log, reorder patch to later, call pci_pasid_init() from
pci_init_capabilities()]
Link: https://lore.kernel.org/r/4957778959fa34eab3e8b3065d1951989c61cb0f.1567029860.git.sathyanarayanan.kuppuswamy@linux.intel.com
Link: https://lore.kernel.org/r/20190905193146.90250-6-helgaas@kernel.org
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c065190b 05-Sep-2019 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/ATS: Cache PRI Capability offset

Previously each PRI interface searched for the PRI Capability. Cache the
capability offset the first time we use it instead of searching each time.

[bhelgaas: commit log, reorder patch to later, call pci_pri_init() from
pci_init_capabilities()]
Link: https://lore.kernel.org/r/0c5495d376faf6dbb8eb2165204c474438aaae65.156
7029860.git.sathyanarayanan.kuppuswamy@linux.intel.com
Link: https://lore.kernel.org/r/20190905193146.90250-5-helgaas@kernel.org
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7608158d 07-Oct-2019 Rob Herring <robh@kernel.org>

PCI: Fix missing bridge dma_ranges resource list cleanup

Commit e80a91ad302b ("PCI: Add dma_ranges window list") added a
dma_ranges resource list, but failed to correctly free the list when
devm_pci_alloc_host_bridge() is used.

Only the iproc host bridge driver is using the dma_ranges list.

Fixes: e80a91ad302b ("PCI: Add dma_ranges window list")
Link: https://lore.kernel.org/r/20191008012325.25700-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Srinath Mannam <srinath.mannam@broadcom.com>


# ca784104 22-Aug-2019 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Get rid of dev->has_secondary_link flag

In some systems, the Device/Port Type in the PCI Express Capabilities
register incorrectly identifies upstream ports as downstream ports.

d0751b98dfa3 ("PCI: Add dev->has_secondary_link to track downstream PCIe
links") addressed this by adding pci_dev.has_secondary_link, which is set
for downstream ports. But this is confusing because pci_pcie_type()
sometimes gives the wrong answer, and it's not obvious that we should use
pci_dev.has_secondary_link instead.

Reduce the confusion by correcting the type of the port itself so that
pci_pcie_type() returns the actual type regardless of what the Device/Port
Type register claims it is. Update the users to call pci_pcie_type() and
pcie_downstream_port() accordingly, and remove pci_dev.has_secondary_link
completely.

Link: https://lore.kernel.org/linux-pci/20190703133953.GK128603@google.com/
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20190822085553.62697-2-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 4a2dbedd 27-Aug-2019 Krzysztof Wilczynski <kw@linux.com>

PCI/ACPI: Remove unnecessary struct hotplug_program_ops

Move the ACPI-specific structs hpx_type0, hpx_type1, hpx_type2 and
hpx_type3 to drivers/pci/pci-acpi.c as they are not used anywhere else.
Then remove the struct hotplug_program_ops that has been shared between
drivers/pci/probe.c and drivers/pci/pci-acpi.c from drivers/pci/pci.h as it
is no longer needed.

The struct hotplug_program_ops was added by 87fcf12e846a ("PCI/ACPI: Remove
the need for 'struct hotplug_params'") and replaced previously used struct
hotplug_params enabling the support for the _HPX Type 3 Setting Record that
was added by f873c51a155a ("PCI/ACPI: Implement _HPX Type 3 Setting
Record").

The new struct allowed for the static functions such program_hpx_type0(),
program_hpx_type1(), etc., from the drivers/pci/probe.c to be called from
the function pci_acpi_program_hp_params() in the drivers/pci/pci-acpi.c.

Previously a programming of _HPX Type 0 was as follows:

drivers/pci/probe.c:

program_hpx_type0()
...
pci_configure_device()
hp_ops = {
.program_type0 = program_hpx_type0,
...
}
pci_acpi_program_hp_params(&hp_ops)

drivers/pci/pci-acpi.c:

pci_acpi_program_hp_params(&hp_ops)
acpi_run_hpx(hp_ops)
decode_type0_hpx_record()
hp_ops->program_type0 # program_hpx_type0() called via hp_ops

After the ACPI-specific functions, structs, enums, etc., have been moved to
drivers/pci/pci-acpi.c there is no need for the hotplug_program_ops as all
of the _HPX Type 0, 1, 2 and 3 are directly accessible.

Link: https://lore.kernel.org/r/20190827094951.10613-4-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 8c3aac6e 27-Aug-2019 Krzysztof Wilczynski <kw@linux.com>

PCI/ACPI: Move _HPP & _HPX functions to pci-acpi.c

Move program_hpx_type0(), program_hpx_type1(), etc., and enums
hpx_type3_dev_type, hpx_type3_fn_type and hpx_type3_cfg_loc to
drivers/pci/pci-acpi.c as these functions and enums are ACPI-specific.

Move structs hpx_type0, hpx_type1, hpx_type2 and hpx_type3 to
drivers/pci/pci.h as these are shared between drivers/pci/pci-acpi.c and
drivers/pci/probe.c.

Link: https://lore.kernel.org/r/20190827094951.10613-3-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e2797ad3 27-Aug-2019 Krzysztof Wilczynski <kw@linux.com>

PCI/ACPI: Rename _HPX structs from hpp_* to hpx_*

The names of the hpp_type0, hpp_type1 and hpp_type2 structs suggest that
they're related to _HPP, when in fact they're related to _HPX.

The struct hpp_type0 denotes an _HPX Type 0 setting record that supersedes
the _HPP setting record, and it has been used interchangeably for _HPP as
per the ACPI specification (see version 6.3, section 6.2.9.1) which states
that it should be applied to PCI, PCI-X and PCI Express devices, with
settings being ignored if they are not applicable.

Rename them to hpx_type0, hpx_type1 and hpx_type2 to reflect their relation
to _HPX rather than _HPP.

Link: https://lore.kernel.org/r/20190827094951.10613-2-kw@linux.com
Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6bf85ba9 23-Jul-2019 Suzuki K Poulose <suzuki.poulose@arm.com>

drivers: Add generic helper to match any device

Add a generic helper to match any/all devices. Using this
introduce new wrappers {bus/driver/class}_find_next_device().

Cc: Elie Morisse <syniurge@gmail.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Nehal Shah <nehal-bakulchandra.shah@amd.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Shyam Sundar S K <shyam-sundar.s-k@amd.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI
Link: https://lore.kernel.org/r/20190723221838.12024-7-suzuki.poulose@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 06013b64 13-Jun-2019 Alex Williamson <alex.williamson@redhat.com>

PCI/IOV: Assume SR-IOV VFs support extended config space.

The SR-IOV specification requires both PFs and VFs to implement a PCIe
capability. Generally this is sufficient to assume extended config space
is present, but we generally also perform additional tests to make sure the
extended config space is reachable and not simply an alias of standard
config space. For a VF to exist extended config space must be accessible
on the PF, therefore we can also assume it to be accessible on the VF.
This enables a micro performance optimization previously implemented in
commit 975bb8b4dc93 ("PCI/IOV: Use VF0 cached config space size for other
VFs") to speed up probing of VFs.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Hao Zheng <yinhe@linux.alibaba.com>


# 76bf6a86 13-Jun-2019 Alex Williamson <alex.williamson@redhat.com>

Revert "PCI/IOV: Use VF0 cached config space size for other VFs"

Revert 975bb8b4dc93 ("PCI/IOV: Use VF0 cached config space size for other
VFs"), which attempted to cache the config space size from the first VF to
re-use for subsequent VFs.

The cached value was determined prior to discovering the PCIe capability on
the VF, which resulted in the first VF reporting the correct config space
size (4K), as it has a special case through pci_cfg_space_size(), while all
the other VFs only reported 256 bytes. As this was only a performance
optimization, we're better off without it.

Fixes: 975bb8b4dc93 ("PCI/IOV: Use VF0 cached config space size for other VFs")
Link: https://lore.kernel.org/r/156046663197.29869.3633634445109057665.stgit@gimli.home
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Hao Zheng <yinhe@linux.alibaba.com>


# 418e3ea1 14-Jun-2019 Suzuki K Poulose <suzuki.poulose@arm.com>

bus_find_device: Unify the match callback with class_find_device

There is an arbitrary difference between the prototypes of
bus_find_device() and class_find_device() preventing their callers
from passing the same pair of data and match() arguments to both of
them, which is the const qualifier used in the prototype of
class_find_device(). If that qualifier is also used in the
bus_find_device() prototype, it will be possible to pass the same
match() callback function to both bus_find_device() and
class_find_device(), which will allow some optimizations to be made in
order to avoid code duplication going forward. Also with that, constify
the "data" parameter as it is passed as a const to the match function.

For this reason, change the prototype of bus_find_device() to match
the prototype of class_find_device() and adjust its callers to use the
const qualifier in accordance with the new prototype of it.

Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Kershner <david.kershner@unisys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Harald Freudenberger <freude@linux.ibm.com>
Cc: Hartmut Knaack <knaack.h@gmx.de>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Jamet <michael.jamet@intel.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Peter Oberparleiter <oberpar@linux.ibm.com>
Cc: Sebastian Ott <sebott@linux.ibm.com>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Yehezkel Bernat <YehezkelShB@gmail.com>
Cc: rafael@kernel.org
Acked-by: Corey Minyard <minyard@acm.org>
Acked-by: David Kershner <david.kershner@unisys.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de> # for the I2C parts
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# de76cda2 04-Jun-2019 Gustavo Pimentel <Gustavo.Pimentel@synopsys.com>

PCI: Decode PCIe 32 GT/s link speed

PCIe r5.0, sec 7.5.3.18, defines a new 32.0 GT/s bit in the Supported Link
Speeds Vector of Link Capabilities 2. Decode this new speed. This does
not affect the speed of the link, which should be negotiated automatically
by the hardware; it only adds decoding when showing the speed to the user.

Previously, reading the speed of a link operating at this speed showed
"Unknown speed" instead of "32.0 GT/s".

Link: https://lore.kernel.org/lkml/92365e3caf0fc559f9ab14bcd053bfc92d4f661c.1559664969.git.gustavo.pimentel@synopsys.com
Signed-off-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 34c6b710 19-Apr-2019 Mohan Kumar <mohankumar718@gmail.com>

PCI: Replace dev_printk(KERN_DEBUG) with dev_info(), etc

Replace dev_printk(KERN_DEBUG) with dev_info(), etc to be more consistent
with other logging and avoid checkpatch warnings.

The KERN_DEBUG messages could be converted to dev_dbg(), but that depends
on CONFIG_DYNAMIC_DEBUG and DEBUG, and we want most of these messages to
*always* be in the dmesg log.

Link: https://lore.kernel.org/lkml/1555733240-19875-1-git-send-email-mohankumar718@gmail.com
Signed-off-by: Mohan Kumar <mohankumar718@gmail.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e80a91ad 03-May-2019 Srinath Mannam <srinath.mannam@broadcom.com>

PCI: Add dma_ranges window list

Add a dma_ranges field in PCI host bridge structure to hold resource
entries list of memory regions in sorted order representing memory ranges
that can be accessed through DMA transactions.

Based-on-a-patch-by: Oza Pawandeep <oza.oza@broadcom.com>
Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
[lorenzo.pieralisi@arm.com: updated commit log]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>


# f873c51a 08-Feb-2019 Alexandru Gagniuc <mr.nuke.me@gmail.com>

PCI/ACPI: Implement _HPX Type 3 Setting Record

The _HPX Type 3 Setting Record is intended to be more generic and allow
configuration of settings not possible with Type 2 records. For example,
firmware could ensure that the completion timeout value is set accordingly
throughout the PCI tree.

Implement support for _HPX Type 3 Setting Records, which were added in the
ACPI 6.3 spec.

Link: https://lore.kernel.org/lkml/20190208162414.3996-4-mr.nuke.me@gmail.com
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 87fcf12e 19-Apr-2019 Alexandru Gagniuc <mr.nuke.me@gmail.com>

PCI/ACPI: Remove the need for 'struct hotplug_params'

We used to first parse all the _HPP and _HPX tables before using the
information to program registers of PCIe devices. Up through HPX Type 2,
there was only one structure of each type, so we could cheat and store it
on the stack.

With HPX Type 3 we get an arbitrary number of entries, so the above model
doesn't scale that well. Instead of parsing all tables at once, parse and
program each entry separately. For _HPP and _HPX Types 0 through 2, this
is functionally equivalent. The change enables the upcoming _HPX Type 3 to
integrate more easily.

Link: https://lore.kernel.org/lkml/20190208162414.3996-3-mr.nuke.me@gmail.com
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: fix build errors]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 2dbce590 19-Nov-2018 Subbaraya Sundeep <sbhatta@marvell.com>

PCI: Assign bus numbers present in EA capability for bridges

The "Enhanced Allocation (EA) for Memory and I/O Resources" ECN, approved
23 October 2014, sec 6.9.1.2, specifies a second DW in the capability for
type 1 (bridge) functions to describe fixed secondary and subordinate bus
numbers. This ECN was included in the PCIe r4.0 spec, but sec 6.9.1.2 was
omitted, presumably by mistake.

Read fixed bus numbers from the EA capability for bridges.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
[bhelgaas: add pci_ea_fixed_busnrs() return value]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6302bf3e 18-Mar-2019 Jean-Philippe Brucker <jean-philippe@linaro.org>

PCI: Init PCIe feature bits for managed host bridge alloc

Two functions allocate a host bridge: devm_pci_alloc_host_bridge() and
pci_alloc_host_bridge(). At the moment, only the unmanaged one initializes
the PCIe feature bits, which prevents from using features such as hotplug
or AER on some systems, when booting with device tree. Make the
initialization code common.

Fixes: 02bfeb484230 ("PCI/portdrv: Simplify PCIe feature permission checking")
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.17+


# 0fa635ae 19-Mar-2019 Lukas Wunner <lukas@wunner.de>

PCI/LINK: Deduplicate bandwidth reports for multi-function devices

If a multi-function device's bandwidth is already limited when it is
enumerated, a message is logged only for function 0. By contrast, when
downtraining occurs after enumeration, a message is logged for all
functions. That's because the former uses pcie_report_downtraining(),
whereas the latter uses __pcie_print_link_status() (which doesn't filter
functions != 0). I am seeing this happen on a MacBookPro9,1 with a GPU
(function 0) and an integrated HDA controller (function 1).

Avoid this incongruence by calling pcie_report_downtraining() in both
cases.

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alexandru Gagniuc <alex.gagniuc@dellteam.com>


# 10ecc818 04-Jan-2019 Bjorn Helgaas <bhelgaas@google.com>

PCI/ASPM: Use LTR if already enabled by platform

RussianNeuroMancer reported that the Intel 7265 wifi on a Dell Venue 11 Pro
7140 table stopped working after wakeup from suspend and bisected the
problem to 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't
have LTR"). David Ward reported the same problem on a Dell Latitude 7350.

After af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before
using it"), we don't enable LTR unless the platform has granted LTR control
to us. In addition, we don't notice if the platform had already enabled
LTR itself.

After 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have
LTR"), we avoid using LTR if we don't think the path to the device has LTR
enabled.

The combination means that if the platform itself enables LTR but declines
to give the OS control over LTR, we unnecessarily avoided using ASPM L1.2.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201469
Fixes: 9ab105deb60f ("PCI/ASPM: Disable ASPM L1.2 Substate if we don't have LTR")
Fixes: af8bb9f89838 ("PCI/ACPI: Request LTR control from platform before using it")
Reported-by: RussianNeuroMancer <russianneuromancer@ya.ru>
Reported-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.18+


# b4f6dcb9 14-Nov-2018 Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>

PCI: Enable SERR# forwarding for all bridges

As per Figure 6-3 in PCIe r4.0, sec 6.2.6, ERR_ messages will be forwarded
from the secondary interface to the primary interface, if the SERR# Enable
bit in the Bridge Control register is set.

It seems clear that an ACPI hotplug parameter method (_HPP or _HPX) that
tells us to "enable SERR in the command register" (ACPI v6.2, sec 6.2.8,
6.2.9.1) refers to PCI_COMMAND_SERR, which enables reporting of errors by
the function itself.

For bridges, we also interpreted that to mean we should enable
PCI_BRIDGE_CTL_SERR, which enables *forwarding* of errors by the bridge.
But we didn't enable PCI_BRIDGE_CTL_SERR anywhere else, which means we
never enabled it for non-ACPI systems or ACPI systems that didn't supply
hotplug parameters.

That means errors reported below bridges were often never forwarded up to a
Root Port where they could be signaled via AER.

Enable PCI_BRIDGE_CTL_SERR for all bridges so we can get better error
reporting for downstream devices.

Signed-off-by: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b2fb5cc5 16-Oct-2018 Honghui Zhang <honghui.zhang@mediatek.com>

PCI: Rely on config space header type, not class code

The PCI configuration space header type tells us whether the device is a
bridge, a CardBus bridge, or a normal device, and defines the layout of the
rest of the header (PCI r3.0 sec 6.1, PCIe r4.0 sec 7.5.1.1.9).

When we rely on the header format, e.g., when we're dealing with bridge
windows, we should check the header type, not the class code. The class
code is loosely related to the header type, but is often incorrect and the
spec doesn't actually require it to be related to the header format.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
[bhelgaas: changelog, keep the PCI_CLASS_BRIDGE_HOST check]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 01b37f85 12-Oct-2018 Changbin Du <changbin.du@intel.com>

PCI: Make pci_size() return real BAR size

Currently, the pci_size() function actually returns 'size-1'. Make it
return real size to avoid confusion.

Signed-off-by: Du Changbin <changbin.du@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 51c48b31 19-Jan-2019 Bjorn Helgaas <bhelgaas@google.com>

PCI: Probe bridge window attributes once at enumeration-time

pci_bridge_check_ranges() determines whether a bridge supports the optional
I/O and prefetchable memory windows and sets the flag bits in the bridge
resources. This *could* be done once during enumeration except that the
resource allocation code completely clears the flag bits, e.g., in the
pci_assign_unassigned_bridge_resources() path.

The problem with pci_bridge_check_ranges() in the resource allocation path
is that we may allocate resources after devices have been claimed by
drivers, and pci_bridge_check_ranges() *changes* the window registers to
determine whether they're writable. This may break concurrent accesses to
devices behind the bridge.

Add a new pci_read_bridge_windows() to determine whether a bridge supports
the optional windows, call it once during enumeration, remember the
results, and change pci_bridge_check_ranges() so it doesn't touch the
bridge windows but sets the flag bits based on those remembered results.

Link: https://lore.kernel.org/linux-pci/1506151482-113560-1-git-send-email-wangzhou1@hisilicon.com
Link: https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02082.html
Reported-by: Yandong Xu <xuyandong2@huawei.com>
Tested-by: Yandong Xu <xuyandong2@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Ofer Hayut <ofer@lightbitslabs.com>
Cc: Roy Shterman <roys@lightbitslabs.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Zhou Wang <wangzhou1@hisilicon.com>


# 617654aa 15-Aug-2018 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI / ACPI: Identify untrusted PCI devices

A malicious PCI device may use DMA to attack the system. An external
Thunderbolt port is a convenient point to attach such a device. The OS
may use IOMMU to defend against DMA attacks.

Some BIOSes mark these externally facing root ports with this
ACPI _DSD [1]:

Name (_DSD, Package () {
ToUUID ("efcc06cc-73ac-4bc3-bff0-76143807c389"),
Package () {
Package () {"ExternalFacingPort", 1},
Package () {"UID", 0 }
}
})

If we find such a root port, mark it and all its children as untrusted.
The rest of the OS may use this information to enable DMA protection
against malicious devices. For instance the device may be put behind an
IOMMU to keep it from accessing memory outside of what the driver has
allocated for it.

While at it, add a comment on top of prp_guids array explaining the
possible caveat resulting when these GUIDs are treated equivalent.

[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>


# 975bb8b4 11-Oct-2018 KarimAllah Ahmed <karahmed@amazon.de>

PCI/IOV: Use VF0 cached config space size for other VFs

Cache the config space size from VF0 and use it for all other VFs instead
of reading it from the config space of each VF. We assume that it will be
the same across all associated VFs.

This is an optimization when enabling SR-IOV on a device with many VFs.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[bhelgaas: use CONFIG_PCI_IOV (not CONFIG_PCI_ATS)]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b0da3498 09-Oct-2018 Christoph Hellwig <hch@lst.de>

PCI: Remove pci_set_dma_max_seg_size()

The few callers can just use dma_set_max_seg_size ()directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a6f44cf9 09-Oct-2018 Christoph Hellwig <hch@lst.de>

PCI: Remove pci_set_dma_seg_boundary()

The two callers can just use dma_set_seg_boundary() directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f0157160 20-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI: Make link active reporting detection generic

The spec has timing requirements when waiting for a link to become active
after a conventional reset. Implement those hard delays when waiting for
an active link so pciehp and dpc drivers don't need to duplicate this.

For devices that don't support data link layer active reporting, wait the
fixed time recommended by the PCIe spec.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# c6635792 30-Aug-2018 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI: Allocate dma_alias_mask with bitmap_zalloc()

Switch to bitmap_zalloc() to show clearly what we are allocating. Besides
that it returns pointer of bitmap type ("unsigned long *") instead of the
opaque "void *".

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9d27e39d 10-Sep-2018 Felix Kuehling <Felix.Kuehling@amd.com>

PCI: Fix enabling of PASID on RC integrated endpoints

Set the eetlp_prefix_path on PCIE_EXP_TYPE_RC_END devices to allow PASID
to be enabled on them. This fixes IOMMUv2 initialization on AMD Carrizo
APUs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201079
Fixes: 7ce3f912ae ("PCI: Enable PASID only if entire path supports End-End TLP prefixes")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9f0e8935 13-Aug-2018 Myron Stowe <myron.stowe@redhat.com>

PCI: Match Root Port's MPS to endpoint's MPSS as necessary

In commit 27d868b5e6cf ("PCI: Set MPS to match upstream bridge"), we made
sure every device's MPS setting matches its upstream bridge, making it more
likely that a hot-added device will work in a system with an optimized MPS
configuration.

Recently I've started encountering systems where the endpoint device's MPSS
capability is less than its Root Port's current MPS value, thus the
endpoint is not capable of matching its upstream bridge's MPS setting (see:
bugzilla via "Link:" below). This leaves the system vulnerable - the
upstream Root Port could respond with larger TLPs than the device can
handle, and the device will consider them to be 'Malformed'.

One could use the "pci=pcie_bus_safe" kernel parameter to work around the
issue, but that forces a user to supply a kernel parameter to get the
system to function reliably and may end up limiting MPS settings of other
unrelated, sub-topologies which could benefit from maintaining their larger
values.

Augment Keith's approach to include tuning down a Root Port's MPS setting
when its hot-added endpoint device is not capable of matching it.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jon Mason <jdmason@kudzu.us>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>


# 3dbe97ef 13-Aug-2018 Myron Stowe <myron.stowe@redhat.com>

PCI: Skip MPS logic for Virtual Functions (VFs)

PCIe r4.0, sec 9.3.5.4, "Device Control Register", shows both
Max_Payload_Size (MPS) and Max_Read_request_Size (MRRS) to be 'RsvdP' for
VFs. Just prior to the table it states:

"PF and VF functionality is defined in Section 7.5.3.4 except where
noted in Table 9-16. For VF fields marked 'RsvdP', the PF setting
applies to the VF."

All of which implies that with respect to Max_Payload_Size Supported
(MPSS), MPS, and MRRS values, we should not be paying any attention to the
VF's fields, but rather only to the PF's. Only looking at the PF's fields
also logically makes sense as it's the sole physical interface to the PCIe
bus.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200527
Fixes: 27d868b5e6cf ("PCI: Set MPS to match upstream bridge")
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # 4.3+
Cc: Keith Busch <keith.busch@intel.com>
Cc: Sinan Kaya <okaya@kernel.org>
Cc: Dongdong Liu <liudongdong3@huawei.com>
Cc: Jon Mason <jdmason@kudzu.us>


# 2d1ce5ec 06-Aug-2018 Alexandru Gagniuc <mr.nuke.me@gmail.com>

PCI: Check for PCIe Link downtraining

When both ends of a PCIe Link are capable of a higher bandwidth than is
currently in use, the Link is said to be "downtrained". A downtrained Link
may indicate hardware or configuration problems in the system, but it's
hard to identify such Links from userspace.

Refactor pcie_print_link_status() so it continues to always print PCIe
bandwidth information, as several NIC drivers desire.

Add a new internal __pcie_print_link_status() to emit a message only when a
device's bandwidth is constrained by the fabric and call it from the PCI
core for all devices, which identifies all downtrained Links. It also
emits messages for a few cases that are technically not downtrained, such
as a x4 device in an open-ended x1 slot.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: changelog, move __pcie_print_link_status() declaration to
drivers/pci/, rename pcie_check_upstream_link() to
pcie_report_downtraining()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ce29af2a 25-Jul-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI: Remove unnecessary include of <linux/pci-aspm.h>

Several PCI core files include pci-aspm.h even though they don't need
anything provided by that file. Remove the unnecessary includes of it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# 44bda4b7 03-Jul-2018 Hari Vyas <hari.vyas@broadcom.com>

PCI: Fix is_added/is_busmaster race condition

When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.

When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.

is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.

A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.

Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.

Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>


# db89ccbe 30-Jun-2018 Rajat Jain <rajatja@google.com>

PCI/AER: Define aer_stats structure for AER capable devices

Define a structure to hold the AER statistics. There are 2 groups of
statistics: dev_* counters that are to be collected for all AER capable
devices and rootport_* counters that are collected for all (AER capable)
rootports only. Allocate and free this structure when device is added or
released (thus counters survive the lifetime of the device).

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# aa667c64 09-Jul-2018 James Puthukattukaran <james.puthukattukaran@oracle.com>

PCI: Workaround IDT switch ACS Source Validation erratum

Some IDT switches incorrectly flag an ACS Source Validation error on
completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
says that completions are never affected by ACS Source Validation. Here's
the text of IDT 89H32H8G3-YC, erratum #36:

Item #36 - Downstream port applies ACS Source Validation to Completions
Section 6.12.1.1 of the PCI Express Base Specification 3.1 states that
completions are never affected by ACS Source Validation. However,
completions received by a downstream port of the PCIe switch from a
device that has not yet captured a PCIe bus number are incorrectly
dropped by ACS Source Validation by the switch downstream port.

Workaround: Issue a CfgWr1 to the downstream device before issuing the
first CfgRd1 to the device. This allows the downstream device to capture
its bus number; ACS Source Validation no longer stops completions from
being forwarded by the downstream port. It has been observed that
Microsoft Windows implements this workaround already; however, some
versions of Linux and other operating systems may not.

When doing the first config read to probe for a device, if the device is
behind an IDT switch with this erratum:

1. Disable ACS Source Validation if enabled
2. Wait for device to become ready to accept config accesses (by using
the Config Request Retry Status mechanism)
3. Do a config write to the endpoint
4. Enable ACS Source Validation (if it was enabled to begin with)

The workaround suggested by IDT is basically only step 3, but we don't know
when the device is ready to accept config requests. That means we need to
do config reads until we receive a non-Config Request Retry Status, which
means we need to disable ACS SV temporarily.

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
[bhelgaas: changelog, clean up whitespace, fold in unused variable fix
from Anders Roxell <anders.roxell@linaro.org>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>


# 7ce3f912 30-Jun-2018 Sinan Kaya <okaya@codeaurora.org>

PCI: Enable PASID only if entire path supports End-End TLP prefixes

A PCIe endpoint carries the process address space identifier (PASID) in
the TLP prefix as part of the memory read/write transaction. The address
information in the TLP is relevant only for a given PASID context.

An IOMMU takes PASID value and the address information from the
TLP to look up the physical address in the system.

PASID is an End-End TLP Prefix (PCIe r4.0, sec 6.20). Sec 2.2.10.2 says

It is an error to receive a TLP with an End-End TLP Prefix by a
Receiver that does not support End-End TLP Prefixes. A TLP in
violation of this rule is handled as a Malformed TLP. This is a
reported error associated with the Receiving Port (see Section 6.2).

Prevent error condition by proactively requiring End-End TLP prefix to be
supported on the entire data path between the endpoint and the root port
before enabling PASID.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 11eb0e0e 04-Jun-2018 Sinan Kaya <okaya@codeaurora.org>

PCI: Make early dump functionality generic

Move early dump functionality into common code so that it is available for
all architectures. No need to carry arch-specific reads around as the read
hooks are already initialized by the time pci_setup_device() is getting
called during scan.

Tested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>


# e412d63d 24-May-2018 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Improve "partially hidden behind bridge" log message

pci_scan_child_bus_extend() complains when we assign an unreachable
secondary bus number to a bridge. For example, given the topology below:

+-1b.0-[01-39]----00.0-[02-3a]--+-00.0-[03]----00.0
+-01.0-[04-39]--
\-02.0-[3a]----00.0

it logs the following messages:

pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:02 [bus 02-39]
pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:01 [bus 01-39]

These messages are incorrect (0000:02 is a bus, not a bridge) and
confusing. Make the message more understandable:

pci 0000:02:02.0: devices behind bridge are unusable because [bus 3a] cannot be assigned for them

Also, remove the reference to CardBus, because this issue affects all
varieties of PCI, not just CardBus.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 70f7880d 28-May-2018 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Improve pci_scan_bridge() and pci_scan_bridge_extend() doc

It is not immediately clear what the two functions actually return so
add kernel-doc comment explaining it a bit better.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 3374c545 28-May-2018 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Account for all bridges on bus when distributing bus numbers

When distributing extra bus number space to hotplug bridges for future
extension, we don't account for the fact that there might be non-hotplug
bridges on the bus after the hotplug bridges. For example:

01:00.0 --+- 02:00.0 (HotPlug-) -- Thunderbolt host controller
+- 02:01.0 (HotPlug+)
\- 02:02.0 (HotPlug-) -- xHCI host controller

pci_scan_child_bus_extend() is supposed to distribute the remaining bus
numbers to the hotplug bridge at 02:01.0, but only after accounting for all
bridges on bus 02. Since we don't check whether there's another
non-hotplug bridge after the hotplug bridge 02:01.0, it may not leave space
for the non-hotplug bridge:

pci 0000:00:1b.0: PCI bridge to [bus 01-39] (Root Port)
pci 0000:01:00.0: PCI bridge to [bus 02-39]
...
pci 0000:02:00.0: PCI bridge to [bus 03]
pci 0000:02:01.0: PCI bridge to [bus 04]
pci_bus 0000:04: [bus 04-39] extended by 0x35
pci_bus 0000:04: bus scan returning with max=39
pci_bus 0000:04: busn_res: [bus 04-39] end is updated to 39
pci 0000:02:02.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:3a: scanning bus
pci_bus 0000:3a: bus scan returning with max=3a
pci_bus 0000:3a: busn_res: [bus 3a] end is updated to 3a
pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:02 [bus 02-39]
pci_bus 0000:3a: [bus 3a] partially hidden behind bridge 0000:01 [bus 01-39]
pci_bus 0000:02: bus scan returning with max=3a
pci_bus 0000:02: busn_res: [bus 02-39] end can not be updated to 3a

The resulting 'lspci -t' output looks like this:

+-1b.0-[01-39]----00.0-[02-3a]--+-00.0-[03]----00.0
^^ +-01.0-[04-39]--
\-02.0-[3a]----00.0
^^
The xHCI host controller behind 02:02.0 is not usable because it would have
to be assigned bus 3a, which is not accessible through 00:1b.0.

To fix this, reserve at least one bus for each bridge while scanning
already configured bridges. Then use this information in the second
scan to correct the available extra bus space for hotplug bridges.

After this change the 'lspci -t' output is what is expected:

+-1b.0-[01-39]----00.0-[02-39]--+-00.0-[03]----00.0
+-01.0-[04-38]--
\-02.0-[39]----00.0

The xHCI controller is now on bus 39, where it is usable.

Fixes: 1c02ea810065 ("PCI: Distribute available buses to hotplug-capable bridges")
Reported-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org


# 1df81a6d 23-May-2018 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: shpchp: Request SHPC control via _OSC when adding host bridge

The SHPC driver now must be builtin (it cannot be a module). If it is
present, request SHPC control immediately when adding the ACPI host bridge.
This is similar to how we handle native PCIe hotplug via pciehp.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9310f0dc 23-May-2018 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: pciehp: Rename host->native_hotplug to host->native_pcie_hotplug

Rename host->native_hotplug to host->native_pcie_hotplug to make room for a
similar flag for SHPC hotplug.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3bbce531 15-May-2018 Jan Kiszka <jan.kiszka@siemens.com>

PCI: Fix devm_pci_alloc_host_bridge() memory leak

Fix a memory leak by freeing the PCI resource list in
devm_pci_release_host_bridge_dev().

Fixes: 5c3f18cce083 ("PCI: Add devm_pci_alloc_host_bridge() interface")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 17e8f0d4 03-May-2018 Gilles Buloz <Gilles.Buloz@kontron.com>

PCI: Check whether bridges allow access to extended config space

Even if a device supports extended config space, i.e., it is a PCI-X Mode 2
or a PCI Express device, the extended space may not be accessible if
there's a conventional PCI bus in the path to it.

We currently figure that out in pci_cfg_space_size() by reading the first
dword of extended config space. On most platforms that returns ~0 data if
the space is inaccessible, but it may set error bits in PCI status
registers, and on some platforms it causes exceptions that we currently
don't recover from.

For example, a PCIe-to-conventional PCI bridge treats config transactions
with a non-zero Extended Register Address as an Unsupported Request on PCIe
and a received Master-Abort on the destination bus (see PCI Express to
PCI/PCI-X Bridge spec, r1.0, sec 4.1.3).

A sample case is a LS1043A CPU (NXP QorIQ Layerscape) platform with the
following bus topology:

LS1043 PCIe Root Port
-> PEX8112 PCIe-to-PCI bridge (doesn't support ext cfg on PCI side)
-> PMC slot connector (for legacy PMC modules)

With a PMC module topology as follows:

PMC connector
-> PCI-to-PCIe bridge
-> PCIe switch (4 ports)
-> 4 PCIe devices (one on each port)

The PCIe devices on the PMC module support extended config space, but we
can't reach it because the PEX8112 can't generate accesses to the extended
space on its secondary bus. Attempts to access it cause Unsupported
Request errors, which result in synchronous aborts on this platform.

To avoid these errors, check whether bridges are capable of generating
extended config space addresses on their secondary interfaces. If they
can't, we restrict devices below the bridge to only the 256-byte
PCI-compatible config space.

Signed-off-by: Gilles Buloz <gilles.buloz@kontron.com>
[bhelgaas: changelog, rework patch so bus_flags testing is all in
pci_bridge_child_ext_cfg_accessible()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# af8bb9f8 17-Apr-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI/ACPI: Request LTR control from platform before using it

Per the PCI Firmware spec r3.2, sec 4.5, an ACPI-based OS should use _OSC
to request control of Latency Tolerance Reporting (LTR) before using it.

Request control of LTR, and if the platform does not grant control, don't
use it.

N.B. If the hardware supports LTR and the ASPM L1.2 substate but the BIOS
doesn't support LTR in _OSC, we previously would enable ASPM L1.2. This
patch will prevent us from enabling ASPM L1.2 in that case. It does not
prevent us from enabling PCI-PM L1.2, since that doesn't depend on LTR.
See PCIe r40, sec 5.5.1, for the L1 PM substate entry conditions.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# cf0921be 19-Mar-2018 KarimAllah Ahmed <karahmed@amazon.de>

PCI/IOV: Use VF0 cached config registers for other VFs

Cache some config data from VF0 and use it for all other VFs instead of
reading it from the config space of each VF. We assume these items are the
same across all associated VFs:

Revision ID
Class Code
Subsystem Vendor ID
Subsystem ID

This is an optimization when enabling SR-IOV on a device with many VFs.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[bhelgaas: changelog, simplify comments, remove unused "device", test
CONFIG_PCI_IOV instead of CONFIG_PCI_ATS, rename functions]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>


# 02bfeb48 09-Mar-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI/portdrv: Simplify PCIe feature permission checking

Some PCIe features (AER, DPC, hotplug, PME) can be managed by either the
platform firmware or the OS, so the host bridge driver may have to request
permission from the platform before using them. On ACPI systems, this is
done by negotiate_os_control() in acpi_pci_root_add().

The PCIe port driver later uses pcie_port_platform_notify() and
pcie_port_acpi_setup() to figure out whether it can use these features.
But all we need is a single bit for each service, so these interfaces are
needlessly complicated.

Simplify this by adding bits in the struct pci_host_bridge to show when the
OS has permission to use each feature:

+ unsigned int native_aer:1; /* OS may use PCIe AER */
+ unsigned int native_hotplug:1; /* OS may use PCIe hotplug */
+ unsigned int native_pme:1; /* OS may use PCIe PME */

These are set when we create a host bridge, and the host bridge driver can
clear the bits corresponding to any feature the platform doesn't want us to
use.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1acfb9b7 12-Mar-2018 Jay Fang <f.fangjian@huawei.com>

PCI: Add decoding for 16 GT/s link speed

PCIe 4.0 defines the 16.0 GT/s link speed. Links can run at that speed
without any Linux changes, but previously their sysfs "max_link_speed" and
"current_link_speed" files contained "Unknown speed", not the expected
"16.0 GT/s".

Add decoding for the new 16 GT/s link speed.

Signed-off-by: Jay Fang <f.fangjian@huawei.com>
[bhelgaas: add PCI_EXP_LNKCAP2_SLS_16_0GB]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dongdong Liu <liudongdong3@huawei.com>


# bf4447fd 02-Mar-2018 KarimAllah Ahmed <karahmed@amazon.de>

PCI/IOV: Skip BAR sizing for VFs

Per PCIe r4.0, sec 9.3.4.1.11, the BAR registers in VF config space are all
RO Zero, so skip sizing them.

This is an optimization when enabling SR-IOV on a device with many VFs.

Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# df62ab5e 09-Mar-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI: Tidy comments

Remove pointless comments that tell us the file name, remove blank line
comments, follow multi-line comment conventions. No functional change
intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 690f4304 07-Mar-2018 Jan Kiszka <jan.kiszka@siemens.com>

PCI: Scan all functions when running over Jailhouse

Per PCIe r4.0, sec 7.5.1.1.9, multi-function devices are required to have a
function 0. Therefore, Linux scans for devices at function 0 (devfn
0/8/16/...) and only scans for other functions if function 0 has its
Multi-Function Device bit set or ARI or SR-IOV indicate there are more
functions.

The Jailhouse hypervisor may pass individual functions of a multi-function
device to a guest without passing function 0, which means a Linux guest
won't find them.

Change Linux PCI probing so it scans all function numbers when running as a
guest over Jailhouse.

This is technically prohibited by the spec, so it is possible that PCI
devices without the Multi-Function Device bit set may have unexpected
behavior in response to this probe.

Originally-by: Benedikt Spranger <b.spranger@linutronix.de>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: jailhouse-dev@googlegroups.com
Cc: Benedikt Spranger <b.spranger@linutronix.de>
Cc: linux-pci@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Link: https://lkml.kernel.org/r/06e279b2a3e06cf6689ab3975f8ab592bba02362.1520408357.git.jan.kiszka@siemens.com


# be20f6b0 17-Jan-2018 KarimAllah Ahmed <karahmed@amazon.de>

PCI/IOV: Skip INTx config reads for VFs

Per PCIe r4.0, sec 9.2.1.4, VFs can not implement INTX, and their Interrupt
Line and Interrupt Pin registers must be RO Zero. Some devices have
thousands of VFs, so skip reading the registers as an optimization.

Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
[bhelgaas: changelog, comment]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>


# 5b0764ca 16-Feb-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI: Probe for device reset support during enumeration

Previously we called pci_probe_reset_function() in this path:

pci_sysfs_init # late_initcall
for_each_pci_dev(dev)
pci_create_sysfs_dev_files(dev)
pci_create_capabilities_sysfs(dev)
pci_probe_reset_function
pci_dev_specific_reset
pcie_has_flr
pcie_capability_read_dword

pci_sysfs_init() is a late_initcall, and a driver may have already claimed
one of these devices and enabled runtime power management for it, so the
device could already be in D3 by the time we get to pci_sysfs_init().

The device itself should respond to the config read even while it's in
D3hot, but if an upstream bridge is also in D3hot, the read won't even
reach the device because the bridge won't forward it downstream to the
device. If the bridge is a PCIe port, it should complete the read as an
Unsupported Request, which may be reported to the CPU as an exception or as
invalid data.

Avoid this case by probing for reset support from pci_init_capabilities(),
before a driver can claim the device. The device may be in D3hot, but any
bridges leading to it should be in D0, so the device's config space should
be fully accessible at that point.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 49b8e3f3 30-Jan-2018 Cyrille Pitchen <cyrille.pitchen@free-electrons.com>

PCI: Add generic function to probe PCI host controllers

This patchs moves generic source code from
drivers/pci/host/pci-host-common.c into drivers/pci/probe.c.

Indeed the extracted lines of code were duplicated by many host
controller drivers. Regrouping them into a generic function gives a
change to properly share this code without introducing a useless
dependency to PCI_HOST_COMMON, which selects PCI_ECAM when not needed by
most host controller drivers.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@free-electrons.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>


# 7328c8f4 26-Jan-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI: Add SPDX GPL-2.0 when no license was specified

b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to
files with no license") added SPDX GPL-2.0 to several PCI files that
previously contained no license information.

Add SPDX GPL-2.0 to all other PCI files that did not contain any license
information and hence were under the default GPL version 2 license of the
kernel.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 7506dc79 17-Jan-2018 Frederick Lawler <fred@fredlawl.com>

PCI: Add wrappers for dev_printk()

Add PCI-specific dev_printk() wrappers and use them to simplify the code
slightly. No functional change intended.

Signed-off-by: Frederick Lawler <fred@fredlawl.com>
[bhelgaas: squash into one patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3e466e2d 30-Nov-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI: Tidy up pci/probe.c comments

No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d57f0b8c 30-Nov-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports

PCIe Downstream Ports normally have only a Device 0 below them. To
optimize enumeration, we don't scan for other devices *unless* the
PCI_SCAN_ALL_PCIE_DEVS flag is set by set by quirks or the
"pci=pcie_scan_all" kernel parameter.

Previously PCI_SCAN_ALL_PCIE_DEVS only affected scanning below Switch
Downstream Ports, not Root Ports.

But the "Nemo" system, also known as the AmigaOne X1000, has a PA Semi Root
Port whose link leads to an AMD/ATI SB600 South Bridge. The Root Port is a
PCIe device, of course, but the SB600 contains only conventional PCI
devices with no visible PCIe port.

Simplify and restructure only_one_child() so that we scan for all possible
devices below Root Ports as well as Switch Downstream Ports when
PCI_SCAN_ALL_PCIE_DEVS is set.

This is enough to make Nemo work with "pci=pcie_scan_all". We would also
like to add a quirk to set PCI_SCAN_ALL_PCIE_DEVS automatically on Nemo so
users wouldn't have to use the "pci=pcie_scan_all" parameter, but we don't
have that yet.

Link: https://lkml.kernel.org/r/CAErSpo55Q8Q=5p6_+uu7ahnw+53ibVDNRXxrzRV9QnUr_9EUfw@mail.gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198057
Reported-and-Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c46fd358 28-Nov-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI/ASPM: Enable Latency Tolerance Reporting when supported

Enable Latency Tolerance Reporting (LTR). Note that LTR must be enabled in
the Root Port first, and must not be enabled in any downstream device
unless the Root Port and all intermediate Switches also support LTR.
See PCIe r3.1, sec 6.18.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>


# 1c02ea81 13-Oct-2017 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Distribute available buses to hotplug-capable bridges

System BIOS sometimes allocates extra bus space for hotplug-capable PCIe
root/downstream ports. This space is needed if the device plugged to the
port will have more hotplug-capable downstream ports. A good example of
this is Thunderbolt. Each Thunderbolt device contains a PCIe switch and
one or more hotplug-capable PCIe downstream ports where the daisy chain
can be extended.

Currently Linux only allocates minimal bus space to make sure all the
enumerated devices barely fit there. The BIOS reserved extra space is
not taken into consideration at all. Because of this we run out of bus
space pretty quickly when more PCIe devices are attached to hotplug
downstream ports in order to extend the chain.

Modify the PCI core so we distribute the available BIOS allocated bus space
equally between hotplug-capable bridges to make sure there is enough bus
space for extending the hierarchy later on.

Update kernel docs of the affected functions.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a20c7f36 13-Oct-2017 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Do not allocate more buses than available in parent

One can ask more buses to be reserved for hotplug bridges by passing
pci=hpbussize=N in the kernel command line. If the parent bus does not
have enough bus space available we incorrectly create child bus with the
requested number of subordinate buses.

In the example below hpbussize is set to one more than we have available
buses in the root port:

pci 0000:07:00.0: [8086:1578] type 01 class 0x060400
pci 0000:07:00.0: scanning [bus 00-00] behind bridge, pass 0
pci 0000:07:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:07:00.0: scanning [bus 00-00] behind bridge, pass 1
pci_bus 0000:08: busn_res: can not insert [bus 08-ff] under [bus 07-3f] (conflicts with (null) [bus 07-3f])
pci_bus 0000:08: scanning bus
...
pci_bus 0000:0a: bus scan returning with max=40
pci_bus 0000:0a: busn_res: [bus 0a-ff] end is updated to 40
pci_bus 0000:0a: [bus 0a-40] partially hidden behind bridge 0000:07 [bus 07-3f]
pci_bus 0000:08: bus scan returning with max=40
pci_bus 0000:08: busn_res: [bus 08-ff] end is updated to 40

Instead of allowing this, limit the subordinate number to be less than or
equal the maximum subordinate number allocated for the parent bus (if it
has any).

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: remove irrelevant dmesg messages]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 4147c2fd 13-Oct-2017 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Open-code the two pass loop when scanning bridges

The current scanning code is really hard to understand because it calls
the same function in a loop where pass value is changed without any
comments explaining it:

for (pass = 0; pass < 2; pass++)
for_each_pci_bridge(dev, bus)
max = pci_scan_bridge(bus, dev, max, pass);

Unfamiliar reader cannot tell easily what is the purpose of this loop
without looking at internals of pci_scan_bridge().

In order to make this bit easier to understand, open-code the loop in
pci_scan_child_bus() and pci_hp_add_bridge() with added comments.

No functional changes intended.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 95e3ba97 13-Oct-2017 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Move pci_hp_add_bridge() to drivers/pci/probe.c

There is not much point of having a file with a single function in it.
Instead we can just move pci_hp_add_bridge() to drivers/pci/probe.c and
make it available always when PCI core is enabled.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: convert printk to dev_err()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 24a0c654 20-Oct-2017 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI: Add for_each_pci_bridge() helper

The following pattern is often used:

list_for_each_entry(dev, &bus->devices, bus_list) {
if (pci_is_bridge(dev)) {
...
}
}

Add a for_each_pci_bridge() helper to make that code easier to write and
read by reducing indentation level. It also saves one or few lines of code
in each occurrence.

Convert PCI core parts here at the same time.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
[bhelgaas: fold in http://lkml.kernel.org/r/20171013165352.25550-1-andriy.shevchenko@linux.intel.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e78e661f 29-Aug-2017 Sinan Kaya <okaya@codeaurora.org>

PCI: Warn periodically while waiting for non-CRS ("device ready") status

Add a print statement in pci_bus_wait_crs() so that user observes the
progress of device polling instead of silently waiting for timeout to be
reached.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: check for timeout first so we don't print "waiting, giving up",
always print time we've slept (not the actual timeout, print a "ready"
message if we've printed a "waiting" message]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6a802ef0 29-Aug-2017 Sinan Kaya <okaya@codeaurora.org>

PCI: Factor out pci_bus_wait_crs()

Configuration Request Retry Status (CRS) was previously hidden inside
pci_bus_read_dev_vendor_id(). We want to add support for CRS in other
situations, such as waiting for a device to become ready after a Function
Level Reset.

Move CRS handling into pci_bus_wait_crs() so it can be called from other
places.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: pass pointer, not value, to pci_bus_wait_crs() so caller gets
correct Vendor ID]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 62bc6a6f 29-Aug-2017 Sinan Kaya <okaya@codeaurora.org>

PCI: Add pci_bus_crs_vendor_id() to detect CRS response data

Add pci_bus_crs_vendor_id() to determine whether data returned for a config
read of the Vendor ID indicates a Configuration Request Retry Status (CRS)
response.

Per PCIe r3.1, sec 2.3.2, this data is only returned if:

- CRS Software Visibility is enabled,
- a config read includes both bytes of the Vendor ID, and
- the read receives a CRS completion

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: changelog, change name to pci_bus_crs_vendor_id(), make static
in probe.c, use it in pci_bus_read_dev_vendor_id()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9f982756 29-Aug-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI: Always check for non-CRS response before timeout

While waiting for a device to become ready (i.e., to return a non-CRS
completion to a read of its Vendor ID), if we got a valid response to the
very last read before timing out, we printed a warning and gave up on the
device even though it was actually ready.

For a typical 60s timeout, we wait about 65s (it's not exact because of the
exponential backoff), but we treated devices that became ready between 33s
and 65s as though they failed.

Move the Device ID read later so we check whether the device is ready
before checking for a timeout.

Thanks to Sinan Kaya <okaya@codeaurora.org>, reorder reads so we always
check device presence after sleep, since it's pointless to sleep unless we
recheck afterwards.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a99b646a 14-Aug-2017 dingtianhong <dingtianhong@huawei.com>

PCI: Disable PCIe Relaxed Ordering if unsupported

When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>


# bccf90d6 23-Jun-2017 Palmer Dabbelt <palmer@dabbelt.com>

PCI: Add a generic weak pcibios_fixup_bus()

Multiple architectures define this as an empty function, and I'm adding
another one as part of the RISC-V port. Add a __weak version of
pcibios_fixup_bus() and delete the now-obselete ones in a handful of
ports.

The only functional change should be that microblaze used to export
pcibios_fixup_bus(). None of the other architectures exports this, so I
just dropped it.

Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 62ce94a7 11-Jul-2017 Sinan Kaya <okaya@codeaurora.org>

PCI: Mark Broadcom HT2100 Root Port Extended Tags as broken

Per PCIe r3.1, sec 2.2.6.2 and 7.8.4, a Requester may not use 8-bit Tags
unless its Extended Tag Field Enable is set, but all Receivers/Completers
must handle 8-bit Tags correctly regardless of their Extended Tag Field
Enable.

Some devices do not handle 8-bit Tags as Completers, so add a quirk for
them. If we find such a device, we disable Extended Tags for the entire
hierarchy to make peer-to-peer DMA possible.

The Broadcom HT2100 seems to have issues with handling 8-bit tags. Mark it
as broken.

The pci_walk_bus() in the quirk handles devices we've enumerated in the
past, and pci_configure_device() handles devices we enumerate in the
future.

Fixes: 60db3a4d8cc9 ("PCI: Enable PCIe Extended Tags if supported")
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1467674
Reported-and-tested-by: Wim ten Have <wim.ten.have@oracle.com>
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
[bhelgaas: changelog, tweak messages, rename bit and quirk]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9ee8a1c4 28-Jun-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Remove pci_scan_root_bus_msi()

The pci_scan_root_bus_bridge() function allows passing a parameterized
struct pci_host_bridge and scanning the resulting PCI bus; since the struct
msi_controller is part of the struct pci_host_bridge and the struct
pci_host_bridge can now be passed to pci_scan_root_bus_bridge() explicitly,
there is no need for a scan interface with a MSI controller parameter.

With all PCI host controller drivers and platform code relying on
pci_scan_root_bus_msi() converted over to pci_scan_root_bus_bridge() the
pci_scan_root_bus_msi() becomes obsolete and can be removed.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 675734ba 21-Mar-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI: Enable ECRC only if device supports it

John reported that an Intel QuickAssist crypto accelerator didn't work in a
Dell PowerEdge R730. The problem seems to be that we enabled ECRC when the
device doesn't support it:

85:00.0 Co-processor [0b40]: Intel Corporation DH895XCC Series QAT [8086:0435]
Capabilities: [100 v1] Advanced Error Reporting
AERCap: First Error Pointer: 00, GenCap- CGenEn+ ChkCap- ChkEn+

1302fcf0d03e ("PCI: Configure *all* devices, not just hot-added ones")
exposed the problem because it applies settings from the _HPX method to all
devices, not just hot-added ones. The R730 supplies an _HPX method that
allows the kernel to enable ECRC.

Only enable ECRC if the device advertises support for it.

Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1571798
Fixes: 1302fcf0d03e ("PCI: Configure *all* devices, not just hot-added ones")
Reported-by: John Mazzie <john_mazzie@dell.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# cea9bc0b 28-Jun-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Make pci_register_host_bridge() PCI core internal

With the introduction of pci_scan_root_bus_bridge() there is no need to
export pci_register_host_bridge() to other kernel subsystems other than the
PCI compilation unit that needs it.

Make pci_register_host_bridge() static to its compilation unit and convert
the existing drivers usage over to pci_scan_root_bus_bridge().

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>


# 1228c4b6 28-Jun-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Add pci_scan_root_bus_bridge() interface

The current pci_scan_root_bus() interface is made up of two main code
paths:

- pci_create_root_bus()
- pci_scan_child_bus()

pci_create_root_bus() is a wrapper function that allows to create a struct
pci_host_bridge structure, initialize it with the passed parameters and
register it with the kernel.

As the struct pci_host_bridge require additional struct members,
pci_create_root_bus() parameters list has grown in time, making it unwieldy
to add further parameters to it in case the struct pci_host_bridge gains
more members fields to augment its functionality.

Since PCI core code provides functions to allocate struct pci_host_bridge,
instead of forcing the pci_create_root_bus() interface to add new
parameters to cater for new struct pci_host_bridge functionality, it is
more suitable to add an interface in PCI core code to scan a PCI bus
straight from a struct pci_host_bridge created and customized by each
specific PCI host controller driver.

Add a pci_scan_root_bus_bridge() function to allow PCI host controller
drivers to create and initialize struct pci_host_bridge and scan the
resulting bus.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>


# 5c3f18cc 28-Jun-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Add devm_pci_alloc_host_bridge() interface

Struct pci_host_bridge can be allocated by PCI host bridge drivers which
usually allocate and map memory through devm managed interfaces.

Add a devm version for the pci_alloc_host_bridge() interface to simplify
PCI host controller driver porting and simplify the driver failure paths.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>


# dff79b91 28-Jun-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Add pci_free_host_bridge() interface

Commit a52d1443bba1 ("PCI: Export host bridge registration interface")
exported the pci_alloc_host_bridge() interface so that PCI host controllers
drivers can make use of it.

Introduce pci_alloc_host_bridge() kernel counterpart to free the host
bridge data structures, pci_free_host_bridge(), export it and update kernel
functions releasing host bridge objects allocated memory to make use of it.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>


# a1c0050a 28-Jun-2017 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Initialize bridge release function at bridge allocation

The introduction of pci_register_host_bridge() kernel interface allows PCI
host controller drivers to create the struct pci_host_bridge object,
initialize it and register it with the kernel so that its corresponding PCI
bus can be scanned and its devices probed.

The host bridge device release function pci_release_host_bridge_dev() is a
static function common for all struct pci_host_bridge allocated objects, so
in its current form cannot be used by PCI host bridge controllers drivers
to initialize the allocated struct pci_host_bridge, which leaves struct
pci_host_bridge devices release function uninitialized.

Since pci_release_host_bridge_dev() is a function common to all PCI host
bridge objects, initialize it in pci_alloc_host_bridge() (ie common host
bridge allocation interface) so that all struct pci_host_bridge objects
have their release function initialized by default at allocation time,
removing the need for exporting the common pci_release_host_bridge_dev()
function to other compilation units.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>


# 99b3c58f 26-May-2017 Piotr Gregor <piotrgregor@rsyncme.org>

PCI: Test INTx masking during enumeration, not at run-time

The test for INTx masking via PCI_COMMAND_INTX_DISABLE performed in
pci_intx_mask_supported() should be done before the device can be used.
This is to avoid writing PCI_COMMAND while the driver owns the device, in
case that has any effect on MSI/MSI-X interrupts.

Move the content of pci_intx_mask_supported() to pci_intx_mask_broken() and
call it from pci_setup_device().

The test result can be queried at any time later using the same
pci_intx_mask_supported() interface as before (though with changed
implementation), so callers (uio, vfio) should be unaffected.

Signed-off-by: Piotr Gregor <piotrgregor@rsyncme.org>
[bhelgaas: changelog, remove quirk check, remove locking, move
dev->broken_intx_masking assignment to caller]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>


# 09515ef5 10-Apr-2017 Sricharan R <sricharan@codeaurora.org>

of/acpi: Configure dma operations at probe time for platform/amba/pci bus devices

Configuring DMA ops at probe time will allow deferring device probe when
the IOMMU isn't available yet. The dma_configure for the device is
now called from the generic device_attach callback just before the
bus/driver probe is called. This way, configuring the DMA ops for the
device would be called at the same place for all bus_types, hence the
deferred probing mechanism should work for all buses as well.

pci_bus_add_devices (platform/amba)(_device_create/driver_register)
| |
pci_bus_add_device (device_add/driver_register)
| |
device_attach device_initial_probe
| |
__device_attach_driver __device_attach_driver
|
driver_probe_device
|
really_probe
|
dma_configure

Similarly on the device/driver_unregister path __device_release_driver is
called which inturn calls dma_deconfigure.

This patch changes the dma ops configuration to probe time for
both OF and ACPI based platform/amba/pci bus devices.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> (drivers/pci part)
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>


# 76dc5268 14-Apr-2017 Matthias Kaehlcke <mka@chromium.org>

PCI: Make PCI_ROM_ADDRESS_MASK a 32-bit constant

A 64-bit value is not needed since a PCI ROM address consists in 32 bits.
This fixes a clang warning about "implicit conversion from 'unsigned long'
to 'u32'".

Also remove now unnecessary casts to u32 from __pci_read_base() and
pci_std_update_resource().

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# dc5205ef 10-Apr-2017 Marc Gonzalez <marc_gonzalez@sigmadesigns.com>

PCI: Improve __pci_read_base() robustness

Local variables 'l' and 'sz' are uninitialized. Normally, they would
be initialized by pci_read_config_dword() but when an error occurs,
some drivers immediately return an error code, which leaves the
argument uninitialized.

Provide a safe initial value to make the code more robust.

Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 8531e283 10-Mar-2017 Lukas Wunner <lukas@wunner.de>

PCI: Recognize Thunderbolt devices

Detect on probe whether a PCI device is part of a Thunderbolt controller.
Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
on such devices. Detect presence of this VSEC and cache it in a newly
added is_thunderbolt bit in struct pci_dev.

Also, add a helper to check whether a given PCI device is situated on a
Thunderbolt daisy chain (i.e., below a PCI device with is_thunderbolt
set).

The necessity arises from the following:

* If an external Thunderbolt GPU is connected to a dual GPU laptop,
that GPU is currently registered with vga_switcheroo even though it
can neither drive the laptop's panel nor be powered off by the
platform. To vga_switcheroo it will appear as if two discrete
GPUs are present. As a result, when the external GPU is runtime
suspended, vga_switcheroo will cut power to the internal discrete GPU
which may not be runtime suspended at all at this moment. The
solution is to not register external GPUs with vga_switcheroo, which
necessitates a way to recognize if they're on a Thunderbolt daisy
chain.

* Dual GPU MacBook Pros introduced 2011+ can no longer switch external
DisplayPort ports between GPUs. (They're no longer just used for DP
but have become combined DP/Thunderbolt ports.) The driver to switch
the ports, drivers/platform/x86/apple-gmux.c, needs to detect presence
of a Thunderbolt controller and, if found, keep external ports
permanently switched to the discrete GPU.

v2: Make kerneldoc for pci_is_thunderbolt_attached() more precise,
drop portion of commit message pertaining to separate series.
(Bjorn Helgaas)

Cc: Andreas Noever <andreas.noever@gmail.com>
Cc: Michael Jamet <michael.jamet@intel.com>
Cc: Tomas Winkler <tomas.winkler@intel.com>
Cc: Amir Levy <amir.jer.levy@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: http://patchwork.freedesktop.org/patch/msgid/0ab165a4a35c0b60f29d4c306c653ead14fcd8f9.1489145162.git.lukas@wunner.de


# 60db3a4d 20-Jan-2017 Sinan Kaya <okaya@codeaurora.org>

PCI: Enable PCIe Extended Tags if supported

Every PCIe device can generate 5-bit transaction Tags, which allow up to 32
concurrent requests. Some devices can generate 8-bit Extended Tags, which
allow up to 256 concurrent requests.

Per the ECN mentioned below, all PCIe Receivers are expected to support
Extended Tags, so devices are allowed (but not required) to enable them by
default.

If a device supports Extended Tags but does not enable them by default,
enable them. This allows the device to have up to 256 outstanding
transactions at a time, which may improve performance.

[bhelgaas: changelog, check for PCIe device]
Link: https://pcisig.com/sites/default/files/specification_documents/ECN_Extended_Tag_Enable_Default_05Sept2008_final.pdf
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 51ebfc92 11-Jan-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI: Enumerate switches below PCI-to-PCIe bridges

A PCI-to-PCIe bridge (a "reverse bridge") has a PCI or PCI-X primary
interface and a PCI Express secondary interface. The PCIe interface is a
Downstream Port that originates a Link. See the "PCI Express to PCI/PCI-X
Bridge Specification", rev 1.0, sections 1.2 and A.6.

The bug report below involves a PCI-to-PCIe bridge and a PCIe switch below
the bridge:

00:1e.0 Intel 82801 PCI Bridge to [bus 01-0a]
01:00.0 Pericom PI7C9X111SL PCIe-to-PCI Reversible Bridge to [bus 02-0a]
02:00.0 Pericom Device 8608 [PCIe Upstream Port] to [bus 03-0a]
03:01.0 Pericom Device 8608 [PCIe Downstream Port] to [bus 0a]

01:00.0 is configured as a PCI-to-PCIe bridge (despite the name printed by
lspci). As we traverse a PCIe hierarchy, device connections alternate
between PCIe Links and internal Switch logic. Previously we did not
recognize that 01:00.0 had a secondary link, so we thought the 02:00.0
Upstream Port *did* have a secondary link. In fact, it's the other way
around: 01:00.0 has a secondary link, and 02:00.0 has internal Switch logic
on its secondary side.

When we thought 02:00.0 had a secondary link, the pci_scan_slot() ->
only_one_child() path assumed 02:00.0 could have only one child, so 03:00.0
was the only possible downstream device. But 03:00.0 doesn't exist, so we
didn't look for any other devices on bus 03.

Booting with "pci=pcie_scan_all" is a workaround, but we don't want users
to have to do that.

Recognize that PCI-to-PCIe bridges originate links on their secondary
interfaces.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=189361
Fixes: d0751b98dfa3 ("PCI: Add dev->has_secondary_link to track downstream PCIe links")
Tested-by: Blake Moore <blake.moore@men.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v4.2+


# 977509f7 02-Jan-2017 Bjorn Helgaas <bhelgaas@google.com>

PCI: Apply _HPX settings only to relevant devices

Previously we didn't check the type of device before trying to apply Type 1
(PCI-X) or Type 2 (PCIe) Setting Records from _HPX.

We don't support PCI-X Setting Records, so this was harmless, but the
warning was useless.

We do support PCIe Setting Records, and we didn't check whether a device
was PCIe before applying settings. I don't think anything bad happened on
non-PCIe devices because pcie_capability_clear_and_set_word(),
pcie_cap_has_lnkctl(), etc., would fail before doing any harm. But it's
ugly to depend on those internals.

Check the device type before attempting to apply Type 1 and Type 2 Setting
Records (Type 0 records are applicable to PCI, PCI-X, and PCIe devices).

A side benefit is that this prevents useless "not supported" warnings when
a BIOS supplies a Type 1 (PCI-X) Setting Record and we try to apply it to
every single device:

pci 0000:00:00.0: PCI-X settings not supported

After this patch, we'll get the warning only when a BIOS supplies a Type 1
record and we have a PCI-X device to which it should be applied.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=187731
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a52d1443 25-Nov-2016 Thierry Reding <treding@nvidia.com>

PCI: Export host bridge registration interface

Allow PCI host bridge drivers to use the new host bridge interfaces to
register their host bridge.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>


# 59094065 25-Nov-2016 Thierry Reding <treding@nvidia.com>

PCI: Allow driver-specific data in host bridge

Provide a way to allocate driver-specific data along with a PCI host bridge
structure. The bridge's ->private field points to this data.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>


# 37d6a0a6 25-Nov-2016 Arnd Bergmann <arnd@arndb.de>

PCI: Add pci_register_host_bridge() interface

Make the existing pci_host_bridge structure a proper device that is usable
by PCI host drivers in a more standard way. In addition to the existing
pci_scan_bus(), pci_scan_root_bus(), pci_scan_root_bus_msi(), and
pci_create_root_bus() interfaces, this unfortunately means having to add
yet another interface doing basically the same thing, and add some extra
code in the initial step.

However, this time it's more likely to be extensible enough that we won't
have to do another one again in the future, and we should be able to reduce
code much more as a result.

The main idea is to pull the allocation of 'struct pci_host_bridge' out of
the registration, and let individual host drivers and architecture code
fill the members before calling the registration function.

There are a number of things we can do based on this:

* Use a single memory allocation for the driver-specific structure
and the generic PCI host bridge
* consolidate the contents of driver-specific structures by moving
them into pci_host_bridge
* Add a consistent interface for removing a PCI host bridge again
when unloading a host driver module
* Replace the architecture specific __weak pcibios_*() functions with
callbacks in a pci_host_bridge device
* Move common boilerplate code from host drivers into the generic
function, based on contents of the structure
* Extend pci_host_bridge with additional members when needed without
having to add arguments to pci_scan_*().
* Move members of struct pci_bus into pci_host_bridge to avoid
having lots of identical copies.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>


# 7a6d312b 28-Nov-2016 Bjorn Helgaas <bhelgaas@google.com>

PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE

Remove the assumption that IORESOURCE_ROM_ENABLE == PCI_ROM_ADDRESS_ENABLE.
PCI_ROM_ADDRESS_ENABLE is the ROM enable bit defined by the PCI spec, so if
we're reading or writing a BAR register value, that's what we should use.
IORESOURCE_ROM_ENABLE is a corresponding bit in struct resource flags.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>


# d760a1ba 21-Nov-2016 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

ACPI: Implement acpi_dma_configure

On DT based systems, the of_dma_configure() API implements DMA
configuration for a given device. On ACPI systems an API equivalent to
of_dma_configure() is missing which implies that it is currently not
possible to set-up DMA operations for devices through the ACPI generic
kernel layer.

This patch fills the gap by introducing acpi_dma_configure/deconfigure()
calls that for now are just wrappers around arch_setup_dma_ops() and
arch_teardown_dma_ops() and also updates ACPI and PCI core code to use
the newly introduced acpi_dma_configure/acpi_dma_deconfigure functions.

Since acpi_dma_configure() is used to configure DMA operations, the
function initializes the dma/coherent_dma masks to sane default values
if the current masks are uninitialized (also to keep the default values
consistent with DT systems) to make sure the device has a complete
default DMA set-up.

The DMA range size passed to arch_setup_dma_ops() is sized according
to the device coherent_dma_mask (starting at address 0x0), mirroring the
DT probing path behaviour when a dma-ranges property is not provided
for the device being probed; this changes the current arch_setup_dma_ops()
call parameters in the ACPI probing case, but since arch_setup_dma_ops()
is a NOP on all architectures but ARM/ARM64 this patch does not change
the current kernel behaviour on them.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [pci]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Tomasz Nowicki <tn@semihalf.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Tested-by: Tomasz Nowicki <tn@semihalf.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Tomasz Nowicki <tn@semihalf.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e42010d8 23-Nov-2016 Johannes Thumshirn <jthumshirn@suse.de>

PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)

Per PCIe spec r3.0, sec 2.3.1.1, the Read Completion Boundary (RCB)
determines the naturally aligned address boundaries on which a Read Request
may be serviced with multiple Completions:

- For a Root Complex, RCB is 64 bytes or 128 bytes
This value is reported in the Link Control Register

Note: Bridges and Endpoints may implement a corresponding command bit
which may be set by system software to indicate the RCB value for the
Root Complex, allowing the Bridge/Endpoint to optimize its behavior
when the Root Complex’s RCB is 128 bytes.

- For all other system elements, RCB is 128 bytes

Per sec 7.8.7, if a Root Port only supports a 64-byte RCB, the RCB of all
downstream devices must be clear, indicating an RCB of 64 bytes. If the
Root Port supports a 128-byte RCB, we may optionally set the RCB of
downstream devices so they know they can generate larger Completions.

Some BIOSes supply an _HPX that tells us to set RCB, even though the Root
Port doesn't have RCB set, which may lead to Malformed TLP errors if the
Endpoint generates completions larger than the Root Port can handle.

The IBM x3850 X6 with BIOS version -[A8E120CUS-1.30]- 08/22/2016 supplies
such an _HPX and a Mellanox MT27500 ConnectX-3 device fails to initialize:

mlx4_core 0000:41:00.0: command 0xfff timed out (go bit not cleared)
mlx4_core 0000:41:00.0: device is going to be reset
mlx4_core 0000:41:00.0: Failed to obtain HW semaphore, aborting
mlx4_core 0000:41:00.0: Fail to reset HCA
------------[ cut here ]------------
kernel BUG at drivers/net/ethernet/mellanox/mlx4/catas.c:193!

After 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
and 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices
with a link"), we apply _HPX settings to *all* devices, not just those
hot-added after boot.

Before 7a1562d4f2d0, we didn't touch the Mellanox RCB, and the device
worked. After 7a1562d4f2d0, we set its RCB to 128, and it failed.

Set the RCB to 128 iff the Root Port supports a 128-byte RCB. Otherwise,
set RCB to 64 bytes. This effectively ignores what _HPX tells us about
RCB.

Note that this change only affects _HPX handling. If we have no _HPX, this
does nothing with RCB.

[bhelgaas: changelog, clear RCB if not set for Root Port]
Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
Fixes: 7a1562d4f2d0 ("PCI: Apply _HPX Link Control settings to all devices with a link")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=187781
Tested-by: Frank Danapfel <fdanapfe@redhat.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Myron Stowe <myron.stowe@redhat.com>
CC: stable@vger.kernel.org # v3.18+


# 66b80809 27-Sep-2016 Keith Busch <kbusch@kernel.org>

PCI/AER: Cache capability position

Save the position of the error reporting capability so it doesn't need to
be rediscovered during error handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Lukas Wunner <lukas@wunner.de>


# 9bb04a0c 11-Jun-2016 Jonathan Yong <jonathan.yong@intel.com>

PCI: Add Precision Time Measurement (PTM) support

Add Precision Time Measurement (PTM) support (see PCIe r3.1, sec 6.22).

Enable PTM on PTM Root devices and switch ports. This does not enable PTM
on endpoints.

There currently are no PTM-capable devices on the market, but it is
expected to be supported by the Intel Apollo Lake platform.

[bhelgaas: complete rework]
Signed-off-by: Jonathan Yong <jonathan.yong@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e16b4660 21-Jul-2016 Keith Busch <kbusch@kernel.org>

PCI: Allow additional bus numbers for hotplug bridges

A user may hot add a switch requiring more than one bus to enumerate. This
previously required a system reboot if BIOS did not sufficiently pad the
bus resource, which they frequently don't do.

Add a kernel parameter so a user can specify the minimum number of bus
numbers to reserve for a hotplug bridge's subordinate buses so rebooting
won't be necessary.

The default is 1, which is equivalent to previous behavior.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d963f651 02-Jun-2016 Mika Westerberg <mika.westerberg@linux.intel.com>

PCI: Power on bridges before scanning new devices

When a PCI device is removed through sysfs interface, the upstream bridge
(PCIe port) can be runtime suspended if it was the last device on that bus.
Now, if the bridge is in D3 we cannot find devices below the bridge
anymore. For example following fails to find the removed device again:

# echo 1 > /sys/bus/pci/devices/0000:00:01.0/0000:01:00.0/remove
# echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan

Where 0000:00:01.0 is the bridge device.

In order to be able to rescan devices below the bridge add
pm_runtime_get_sync()/pm_runtime_put() calls to pci_scan_bridge(). This
should keep bridges powered on while their children devices are being
scanned.

Reported-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9c7cb891 10-Jun-2016 Tomasz Nowicki <tn@semihalf.com>

PCI: Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC

Instead of assigning bus->domain_nr inside pci_bus_assign_domain_nr(),
return the domain and let the caller do the assignment. Rename
pci_bus_assign_domain_nr() to pci_bus_find_domain_nr() to reflect this.

No functional change intended.

[bhelgaas: changelog]
Signed-off-by: Tomasz Nowicki <tn@semihalf.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>


# ad67b437 10-May-2016 Prarit Bhargava <prarit@redhat.com>

PCI: Disable all BAR sizing for devices with non-compliant BARs

b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant
BARs") disabled BAR sizing for BARs 0-5 of devices that don't comply with
the PCI spec. But it didn't do anything for expansion ROM BARs, so we
still try to size them, resulting in warnings like this on Broadwell-EP:

pci 0000:ff:12.0: BAR 6: failed to assign [mem size 0x00000001 pref]

Move the non-compliant BAR check from __pci_read_base() up to
pci_read_bases() so it applies to the expansion ROM BAR as well as
to BARs 0-5.

Note that direct callers of __pci_read_base(), like sriov_init(), will now
bypass this check. We haven't had reports of devices with broken SR-IOV
BARs yet.

[bhelgaas: changelog]
Fixes: b84106b4e229 ("PCI: Disable IO/MEM decoding for devices with non-compliant BARs")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andi Kleen <ak@linux.intel.com>


# 338c3149 03-Mar-2016 Jacek Lawrynowicz <jacek.lawrynowicz@intel.com>

PCI: Add support for multiple DMA aliases

Solve IOMMU support issues with PCIe non-transparent bridges that use
Requester ID look-up tables (RID-LUT), e.g., the PEX8733.

The NTB connects devices in two independent PCI domains. Devices separated
by the NTB are not able to discover each other. A PCI packet being
forwared from one domain to another has to have its RID modified so it
appears on correct bus and completions are forwarded back to the original
domain through the NTB. The RID is translated using a preprogrammed table
(LUT) and the PCI packet propagates upstream away from the NTB. If the
destination system has IOMMU enabled, the packet will be discarded because
the new RID is unknown to the IOMMU. Adding a DMA alias for the new RID
allows IOMMU to properly recognize the packet.

Each device behind the NTB has a unique RID assigned in the RID-LUT. The
current DMA alias implementation supports only a single alias, so it's not
possible to support mutiple devices behind the NTB when IOMMU is enabled.

Enable all possible aliases on a given bus (256) that are stored in a
bitset. Alias devfn is directly translated to a bit number. The bitset is
not allocated for devices that have no need for DMA aliases.

More details can be found in the following article:
http://www.plxtech.com/files/pdf/technical/expresslane/RTC_Enabling%20MulitHostSystemDesigns.pdf

Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Joerg Roedel <jroedel@suse.de>


# 057bd2e0 09-Feb-2016 Thierry Reding <treding@nvidia.com>

PCI: Add pci_ops.{add,remove}_bus() callbacks

Add pci_ops.{add,remove}_bus() callbacks, which will be called on every
newly created bus and when a bus is being removed, respectively. This can
be used by drivers to implement driver-specific initialization and teardown
of the bus, in addition to the architecture-specifics implemented by the
pcibios_add_bus() and the pcibios_remove_bus() functions.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f1cd93f9 22-Feb-2016 Bjorn Helgaas <bhelgaas@google.com>

PCI: Rename VPD symbols to remove unnecessary "pci22"

There's only one kind of VPD, so we don't need to qualify it as "the
version described by PCI spec rev 2.2."

Rename the following symbols to remove unnecessary "pci22":

PCI_VPD_PCI22_SIZE -> PCI_VPD_MAX_SIZE
pci_vpd_pci22_size() -> pci_vpd_size()
pci_vpd_pci22_wait() -> pci_vpd_wait()
pci_vpd_pci22_read() -> pci_vpd_read()
pci_vpd_pci22_write() -> pci_vpd_write()
pci_vpd_pci22_ops -> pci_vpd_ops
pci_vpd_pci22_init() -> pci_vpd_init()

Tested-by: Shane Seymour <shane.seymour@hpe.com>
Tested-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>


# b84106b4 25-Feb-2016 Bjorn Helgaas <bhelgaas@google.com>

PCI: Disable IO/MEM decoding for devices with non-compliant BARs

The PCI config header (first 64 bytes of each device's config space) is
defined by the PCI spec so generic software can identify the device and
manage its usage of I/O, memory, and IRQ resources.

Some non-spec-compliant devices put registers other than BARs where the
BARs should be. When the PCI core sizes these "BARs", the reads and writes
it does may have unwanted side effects, and the "BAR" may appear to
describe non-sensical address space.

Add a flag bit to mark non-compliant devices so we don't touch their BARs.
Turn off IO/MEM decoding to prevent the devices from consuming address
space, since we can't read the BARs to find out what that address space
would be.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
CC: stable@vger.kernel.org


# 788858eb 16-Feb-2016 Jake Oshins <jakeo@microsoft.com>

PCI: Look up IRQ domain by fwnode_handle

If pci_host_bridge_msi_domain() can't find an IRQ domain through the OF
tree, try to look it up directly through the fwnode_handle.

[bhelgaas: changelog]
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5bd28338 05-Feb-2016 Bjorn Helgaas <bhelgaas@google.com>

PCI: Remove includes of empty asm-generic/pci-bridge.h

include/asm-generic/pci-bridge.h is now empty, so remove every #include of
it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will.deacon@arm.com> (arm64)


# 5bbe029f 05-Feb-2016 Bjorn Helgaas <bhelgaas@google.com>

PCI: Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h

The PCI flag management constants and functions were previously declared in
include/asm-generic/pci-bridge.h. But they are not specific to bridges,
and arches did not include pci-bridge.h consistently.

Move the following interfaces and related constants to include/linux/pci.h
and remove pci-bridge.h:

pci_set_flags()
pci_add_flags()
pci_clear_flags()
pci_has_flag()

This fixes these warnings when building for some arches:

drivers/pci/host/pcie-designware.c:562:20: error: 'PCI_PROBE_ONLY' undeclared (first use in this function)
drivers/pci/host/pcie-designware.c:562:7: error: implicit declaration of function 'pci_has_flag' [-Werror=implicit-function-declaration]

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 471036b2 10-Dec-2015 Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

acpi: pci: Setup MSI domain for ACPI based pci devices

This patch introduces pci_msi_register_fwnode_provider() for irqchip
to register a callback, to provide a way to determine appropriate MSI
domain for a pci device.

It also introduces pci_host_bridge_acpi_msi_domain(), which returns
the MSI domain of the specified PCI host bridge with DOMAIN_BUS_PCI_MSI
bus token. Then, it is assigned to pci device.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>


# 8e5a395a 07-Dec-2015 Bjorn Helgaas <bhelgaas@google.com>

PCI: Simplify config space size computation

Restructure the logic so we return the config space size as soon as we know
it. This reduces indentation, removes negations, and removes gotos.

No functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 128fc68c 30-Nov-2015 Bjorn Helgaas <bhelgaas@google.com>

PCI/MSI: Remove empty pci_msi_init_pci_dev()

4a7cc8316705 ("genirq/MSI: Move msi_list from struct pci_dev to struct
device") removed the contents of pci_msi_init_pci_dev(). All
implementation of it are now empty, so remove it completely.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e80e7edc 20-Oct-2015 Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

PCI/MSI: Initialize MSI capability for all architectures

1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't
support MSI") moved dev->msi_cap and dev->msix_cap initialization from the
pci_init_capabilities() path (used on all architectures) to the
pci_setup_device() path (not used on Open Firmware architectures).

This broke MSI or MSI-X on Open Firmware machines. 4d9aac397a5d
("powerpc/PCI: Disable MSI/MSI-X interrupts at PCI probe time in OF case")
fixed it for PowerPC but not for SPARC.

Set up MSI and MSI-X (initialize msi_cap and msix_cap and disable MSI and
MSI-X) in pci_init_capabilities() so all architectures do it the same way.

This reverts 4d9aac397a5d since this patch fixes the problem generically
for both PowerPC and SPARC.

[bhelgaas: changelog, make pci_msi_setup_pci_dev() static]
Fixes: 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 768acd64 18-Nov-2015 Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

PCI: Fix OF logic in pci_dma_configure()

This patch fixes a bug introduced by previous commit,
which incorrectly checkes the of_node of the end-point device.
Instead, it should check the of_node of the host bridge.

Fixes: 50230713b639 ("PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()")
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 29dbe1f0 28-Oct-2015 Suthikulpanit, Suravee <Suravee.Suthikulpanit@amd.com>

PCI: ACPI: Add support for PCI device DMA coherency

This patch adds support for setting up PCI device DMA coherency from
ACPI _CCA object that should normally be specified in the DSDT node
of its PCI host bridge.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 50230713 28-Oct-2015 Suthikulpanit, Suravee <Suravee.Suthikulpanit@amd.com>

PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()

This patch move of_pci_dma_configure() to a more generic
pci_dma_configure(), which can be extended by non-OF code (e.g. ACPI).

This has no functional change.

Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Acked-by: Rob Herring <robh+dt@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 938174e5 29-Oct-2015 Sean O. Stalley <sean.stalley@intel.com>

PCI: Add support for Enhanced Allocation devices

Add support for devices using Enhanced Allocation entries instead of BARs.
This allows the kernel to parse the EA Extended Capability structure in PCI
config space and claim the BAR-equivalent resources.

See https://pcisig.com/sites/default/files/specification_documents/ECN_Enhanced_Allocation_23_Oct_2014_Final.pdf

[bhelgaas: add spec URL, s/pci_ea_set_flags/pci_ea_flags/, consolidate
declarations, print unknown property in hex to match spec]
Signed-off-by: Sean O. Stalley <sean.stalley@intel.com>
[david.daney@cavium.com: Add more support/checking for Entry Properties,
allow EA behind bridges, rewrite some error messages.]
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 54fa97ee 02-Oct-2015 Marc Zyngier <maz@kernel.org>

PCI/MSI: Allow the MSI domain to be device-specific

So far, we've always considered that for a given PCI device, its
MSI controller was either set by the architecture-specific
pcibios hook, or simply inherited from the host bridge.

This doesn't cover things like firmware-defined topologies like
msi-map (DT) or IORT (ACPI), which can provide information about
which MSI controller to use on a per-device basis.

This patch adds the necessary hook into the MSI code to allow this
feature, and provides the msi-map functionnality as a first
implementation.

Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>


# 098259eb 02-Oct-2015 Marc Zyngier <maz@kernel.org>

PCI: Add per-device MSI domain hook

So far, we have considered that the MSI domain for a device was
either set via the architecture-dependent pcibios implementation
or inherited from the host bridge.

As we're about to break that assumption, add pci_dev_msi_domain
which is the equivalent of pci_host_bridge_msi_domain, but for
a single device.

Other than moving things around a bit, this patch on its own
has no effect.

Acked-by: Rob Herring <robh@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>


# 38ea72bd 18-Sep-2015 Alex Williamson <alex.williamson@redhat.com>

PCI/MSI: Fix MSI IRQ domains for VFs on virtual buses

SR-IOV creates a virtual bus where bus->self is NULL. When we add VFs and
scan for an MSI domain, pci_set_bus_msi_domain() dereferences bus->self,
which causes a kernel NULL pointer dereference oops.

Scan up to the parent bus until we find a real bridge where we can get the
MSI domain.

[bhelgaas: changelog]
Fixes: 44aa0c657e3e ("PCI/MSI: Add hooks to populate the msi_domain field")
Tested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>


# b07461a8 17-Sep-2015 Taku Izumi <izumi.taku@jp.fujitsu.com>

PCI/AER: Clear error status registers during enumeration and restore

AER errors might be recorded when powering-on devices. These errors can be
ignored, so firmware usually clears them before the OS enumerates devices.
However, firmware is not involved when devices are added via hotplug, so
the OS may discover power-up errors that should be ignored. The same may
happen when powering up devices when resuming after suspend.

Clear the AER error status registers during enumeration and resume.

[bhelgaas: changelog, remove repetitive comments]
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 237865f1 15-Sep-2015 Bjorn Helgaas <bhelgaas@google.com>

PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code"

Revert dff22d2054b5 ("PCI: Call pci_read_bridge_bases() from core instead
of arch code").

Reading PCI bridge windows is not arch-specific in itself, but there is PCI
core code that doesn't work correctly if we read them too early. For
example, Hannes found this case on an ARM Freescale i.mx6 board:

pci_bus 0000:00: root bus resource [mem 0x01000000-0x01efffff]
pci 0000:00:00.0: PCI bridge to [bus 01-ff]
pci 0000:00:00.0: BAR 8: no space for [mem size 0x01000000] (mem window)
pci 0000:01:00.0: BAR 2: failed to assign [mem size 0x00200000]
pci 0000:01:00.0: BAR 1: failed to assign [mem size 0x00004000]
pci 0000:01:00.0: BAR 0: failed to assign [mem size 0x00000100]

The 00:00.0 mem window needs to be at least 3MB: the 01:00.0 device needs
0x204100 of space, and mem windows are megabyte-aligned.

Bus sizing can increase a bridge window size, but never *decrease* it (see
d65245c3297a ("PCI: don't shrink bridge resources")). Prior to
dff22d2054b5, ARM didn't read bridge windows at all, so the "original size"
was zero, and we assigned a 3MB window.

After dff22d2054b5, we read the bridge windows before sizing the bus. The
firmware programmed a 16MB window (size 0x01000000) in 00:00.0, and since
we never decrease the size, we kept 16MB even though we only needed 3MB.
But 16MB doesn't fit in the host bridge aperture, so we failed to assign
space for the window and the downstream devices.

I think this is a defect in the PCI core: we shouldn't rely on the firmware
to assign sensible windows.

Ray reported a similar problem, also on ARM, with Broadcom iProc.

Issues like this are too hard to fix right now, so revert dff22d2054b5.

Reported-by: Hannes <oe5hpm@gmail.com>
Reported-by: Ray Jui <rjui@broadcom.com>
Link: http://lkml.kernel.org/r/CAAa04yFQEUJm7Jj1qMT57-LG7ZGtnhNDBe=PpSRa70Mj+XhW-A@mail.gmail.com
Link: http://lkml.kernel.org/r/55F75BB8.4070405@broadcom.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>


# 22b6839b 24-Aug-2015 Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>

PCI: Make pci_msi_setup_pci_dev() non-static for use by arch code

Commit 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel
doesn't support MSI") changed the location of the code that initialises
dev->msi_cap/msix_cap and then disables MSI/MSI-X interrupts at PCI
probe time in devices that have this flag set. It moved the code from
pci_msi_init_pci_dev() to a new function named pci_msi_setup_pci_dev(),
called by pci_setup_device().

The pseries PCI probing code does not call pci_setup_device(), so since
the aforementioned commit the function pci_msi_setup_pci_dev() is not
called and MSI/MSI-X interrupts are left enabled. Additionally because
dev->msi_cap/msix_cap are not initialised no driver can ever enable
MSI/MSI-X.

To fix this, the pseries PCI probe should manually call
pci_msi_setup_pci_dev(), so this patch makes it non-static.

Fixes: 1851617cd2da ("PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI")
[mpe: Update change log to mention dev->msi_cap/msix_cap]
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>


# 27d868b5 24-Aug-2015 Keith Busch <kbusch@kernel.org>

PCI: Set MPS to match upstream bridge

Firmware typically configures the PCIe fabric with a consistent Max Payload
Size setting based on the devices present at boot. A hot-added device
typically has the power-on default MPS setting (128 bytes), which may not
match the fabric.

The previous Linux default, in the absence of any "pci=pcie_bus_*" options,
was PCIE_BUS_TUNE_OFF, in which we never touch MPS, even for hot-added
devices.

Add a new default setting, PCIE_BUS_DEFAULT, in which we make sure every
device's MPS setting matches the upstream bridge. This makes it more
likely that a hot-added device will work in a system with optimized MPS
configuration.

Note that if we hot-add a device that only supports 128-byte MPS, it still
likely won't work because we don't reconfigure the rest of the fabric.
Booting with "pci=pcie_bus_peer2peer" is a workaround for this because it
sets MPS to 128 for everything.

[bhelgaas: changelog, new default, rework for pci_configure_device() path]
Tested-by: Keith Busch <keith.busch@intel.com>
Tested-by: Jordan Hargrave <jharg93@gmail.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# 9dae3a97 20-Aug-2015 Bjorn Helgaas <bhelgaas@google.com>

PCI: Move MPS configuration check to pci_configure_device()

Previously we checked for invalid MPS settings, i.e., a device with MPS
different than its upstream bridge, in pcie_bus_detect_mps(). We only did
this if the arch or hotplug driver called pcie_bus_configure_settings(),
and then only if PCIe bus tuning was disabled (PCIE_BUS_TUNE_OFF).

Move the MPS checking code to pci_configure_device(), so we do it in the
pci_device_add() path for every device.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d2a7926d 03-Aug-2015 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Add pci_scan_root_bus_msi()

Add a pci_scan_root_bus_msi() interface so an arch can specify the MSI
controller up front. This removes the need for a pcibios callback to set
the MSI controller later.

This is not exported because I'd like to replace the variety of "scan root
bus" interfaces with a single, more extensible interface that can handle
the MSI controller, domain, pci_ops, resources, etc. I hope this interface
is temporary.

[bhelgaas: changelog, split into separate patch]
Suggested-by: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jingoo Han <jingoohan1@gmail.com>


# b35b1df5 17-Aug-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Tolerate hierarchies with no Root Port

We should not assume any particular hardware topology. Commit d0751b98dfa3
("PCI: Add dev->has_secondary_link to track downstream PCIe links") relied
on the assumption that every PCIe hierarchy is rooted at a Root Port. But
we can't rely on any assumption about what hardware we will find; we just
have to deal with the world as it is.

On some platforms, PCIe devices (endpoints, switch upstream ports, etc.)
appear directly on the root bus, and there is no Root Port in the PCI bus
hierarchy. For example, Meelis observed these top-level devices on a
Sparc V245:

0000:02:00.0 PCI bridge to [bus 03-0d] Switch Upstream Port
0001:02:00.0 PCI bridge to [bus 03] PCIe to PCI/PCI-X Bridge

These devices *look* like they have links going upstream, but there really
are no upstream devices.

In set_pcie_port_type(), we used the parent device to figure out which side
of a switch port has a link, so if the parent device did not exist, we
dereferenced a NULL parent pointer.

Check whether the parent device exists before dereferencing it.

Meelis observed this oops on Sparc V245 and T2000. Ben Herrenschmidt says
this is also possible on IBM PowerVM guests on PowerPC.

[bhelgaas: changelog, comment]
Link: http://lkml.kernel.org/r/alpine.LRH.2.20.1508122118210.18637@math.ut.ee
Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>


# edc90fee 17-Jul-2015 Bjorn Helgaas <bhelgaas@google.com>

PCI: Allocate ATS struct during enumeration

Previously, we allocated pci_ats structures when an IOMMU driver called
pci_enable_ats(). An SR-IOV VF shares the STU setting with its PF, so when
enabling ATS on the VF, we allocated a pci_ats struct for the PF if it
didn't already have one. We held the sriov->lock to serialize threads
concurrently enabling ATS on several VFS so only one would allocate the PF
pci_ats.

Gregor reported a deadlock here:

pci_enable_sriov
sriov_enable
virtfn_add
mutex_lock(dev->sriov->lock) # acquire sriov->lock
pci_device_add
device_add
BUS_NOTIFY_ADD_DEVICE notifier chain
iommu_bus_notifier
amd_iommu_add_device # iommu_ops.add_device
init_iommu_group
iommu_group_get_for_dev
iommu_group_add_device
__iommu_attach_device
amd_iommu_attach_device # iommu_ops.attach_device
attach_device
pci_enable_ats
mutex_lock(dev->sriov->lock) # deadlock

There's no reason to delay allocating the pci_ats struct, and if we
allocate it for each device at enumeration-time, there's no need for
locking in pci_enable_ats().

Allocate pci_ats struct during enumeration, when we initialize other
capabilities.

Note that this implementation requires ATS to be enabled on the PF first,
before on any of the VFs because the PF controls the STU for all the VFs.

Link: http://permalink.gmane.org/gmane.linux.kernel.iommu/9433
Reported-by: Gregor Dick <gdick@solarflare.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>


# 92b19ff5 10-Aug-2015 Dan Williams <dan.j.williams@intel.com>

cleanup IORESOURCE_CACHEABLE vs ioremap()

Quoting Arnd:
I was thinking the opposite approach and basically removing all uses
of IORESOURCE_CACHEABLE from the kernel. There are only a handful of
them.and we can probably replace them all with hardcoded
ioremap_cached() calls in the cases they are actually useful.

All existing usages of IORESOURCE_CACHEABLE call ioremap() instead of
ioremap_nocache() if the resource is cacheable, however ioremap() is
uncached by default. Clearly none of the existing usages care about the
cacheability. Particularly devm_ioremap_resource() never worked as
advertised since it always fell back to plain ioremap().

Clean this up as the new direction we want is to convert
ioremap_<type>() usages to memremap(..., flags).

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 017ffe64 17-Jul-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Hold pci_slot_mutex while searching bus->slots list

Previously, pci_setup_device() and similar functions searched the
pci_bus->slots list without any locking. It was possible for another
thread to update the list while we searched it.

Add pci_dev_assign_slot() to search the list while holding pci_slot_mutex.

[bhelgaas: changelog, fold in CONFIG_SYSFS fix]
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b165e2b6 28-Jul-2015 Marc Zyngier <maz@kernel.org>

PCI/MSI: Add support for OF-provided msi_domain

In order to populate the PCI host bridge msi_domain, use the
"msi-parent" attribute to lookup a corresponding irq domain.
If found, this is our MSI domain.

This gets plugged into the core PCI code.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Ma Jun <majun258@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Duc Dang <dhdang@apm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1438091186-10244-6-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 44aa0c65 28-Jul-2015 Marc Zyngier <maz@kernel.org>

PCI/MSI: Add hooks to populate the msi_domain field

In order to be able to populate the device msi_domain field,
add the necessary hooks to propagate the host bridge msi_domain
across secondary busses to devices.

So far, nobody populates the initial msi_domain.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <linux-arm-kernel@lists.infradead.org>
Cc: Yijing Wang <wangyijing@huawei.com>
Cc: Ma Jun <majun258@huawei.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Duc Dang <dhdang@apm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1438091186-10244-5-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# dff22d20 09-Jul-2015 Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

PCI: Call pci_read_bridge_bases() from core instead of arch code

When we scan a PCI bus, we read PCI-PCI bridge window registers with
pci_read_bridge_bases() so we can validate the resource hierarchy. Most
architectures call pci_read_bridge_bases() from pcibios_fixup_bus(), but
PCI-PCI bridges are not arch-specific, so this doesn't need to be in
arch-specific code.

Call pci_read_bridge_bases() directly from the PCI core instead of from
arch code.

For alpha and mips, we now call pci_read_bridge_bases() always; previously
we only called it if PCI_PROBE_ONLY was set.

[bhelgaas: changelog]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: James E.J. Bottomley <jejb@parisc-linux.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Bjorn Helgaas <bhelgaas@google.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: David Howells <dhowells@redhat.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Tony Luck <tony.luck@intel.com>
CC: David S. Miller <davem@davemloft.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Michal Simek <monstr@monstr.eu>
CC: Chris Zankel <chris@zankel.net>


# 2b4aed1d 19-Jun-2015 Bjorn Helgaas <bhelgaas@google.com>

PCI: Shift PCI_CLASS_NOT_DEFINED consistently with other classes

The PCI class in dev->class is a three-byte value comprising a base class,
sub-class, and interface type. PCI_CLASS_NOT_DEFINED includes the base
class and sub-class, but not the interface type, so it should be shifted to
make space for the interface. It happens that PCI_CLASS_NOT_DEFINED is
zero, so it doesn't matter in the end, but we should still use it
consistently with other class definitions.

Treat PCI_CLASS_NOT_DEFINED as a base class/sub-class value that should
appear in bits 8-23 of dev->class.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c0300089 28-Apr-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Remove unused pci_scan_bus_parented()

No one uses pci_scan_bus_parented() any more, remove it.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3a9ad0b4 27-May-2015 Yinghai Lu <yinghai@kernel.org>

PCI: Add pci_bus_addr_t

David Ahern reported that d63e2e1f3df9 ("sparc/PCI: Clip bridge windows
to fit in upstream windows") fails to boot on sparc/T5-8:

pci 0000:06:00.0: reg 0x184: can't handle BAR above 4GB (bus address 0x110204000)

The problem is that sparc64 assumed that dma_addr_t only needed to hold DMA
addresses, i.e., bus addresses returned via the DMA API (dma_map_single(),
etc.), while the PCI core assumed dma_addr_t could hold *any* bus address,
including raw BAR values. On sparc64, all DMA addresses fit in 32 bits, so
dma_addr_t is a 32-bit type. However, BAR values can be 64 bits wide, so
they don't fit in a dma_addr_t. d63e2e1f3df9 added new checking that
tripped over this mismatch.

Add pci_bus_addr_t, which is wide enough to hold any PCI bus address,
including both raw BAR values and DMA addresses. This will be 64 bits
on 64-bit platforms and on platforms with a 64-bit dma_addr_t. Then
dma_addr_t only needs to be wide enough to hold addresses from the DMA API.

[bhelgaas: changelog, bugzilla, Kconfig to ensure pci_bus_addr_t is at
least as wide as dma_addr_t, documentation]
Fixes: d63e2e1f3df9 ("sparc/PCI: Clip bridge windows to fit in upstream windows")
Fixes: 23b13bc76f35 ("PCI: Fail safely if we can't handle BARs larger than 4GB")
Link: http://lkml.kernel.org/r/CAE9FiQU1gJY1LYrxs+ma5LCTEEe4xmtjRG0aXJ9K_Tsu+m9Wuw@mail.gmail.com
Link: http://lkml.kernel.org/r/1427857069-6789-1-git-send-email-yinghai@kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=96231
Reported-by: David Ahern <david.ahern@oracle.com>
Tested-by: David Ahern <david.ahern@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
CC: stable@vger.kernel.org # v3.19+


# 777e61ea 21-May-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Use dev->has_secondary_link to find downstream PCIe links

Previously we assumed that PCIe Root Ports and Downstream Ports had Links
on their secondary side. That is true in most systems, but it is possible
to connect a switch with either an Upstream or a Downstream Port leading
downstream.

Instead of relying on the component type to identify devices that have
links leading downstream, use the "dev->has_secondary_link" field.

[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d0751b98 21-May-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Add dev->has_secondary_link to track downstream PCIe links

A PCIe Port is an interface to a Link. A Root Port is a PCI-PCI bridge in
a Root Complex and has a Link on its secondary (downstream) side. For
other Ports, the Link may be on either the upstream (closer to the Root
Complex) or downstream side of the Port.

The usual topology has a Root Port connected to an Upstream Port. We
previously assumed this was the only possible topology, and that a
Downstream Port's Link was always on its downstream side, like this:

+---------------------+
+------+ | Downstream |
| Root | | Upstream Port +--Link--
| Port +--Link--+ Port |
+------+ | Downstream |
| Port +--Link--
+---------------------+

But systems do exist (see URL below) where the Root Port is connected to a
Downstream Port. In this case, a Downstream Port's Link may be on either
the upstream or downstream side:

+---------------------+
+------+ | Upstream |
| Root | | Downstream Port +--Link--
| Port +--Link--+ Port |
+------+ | Downstream |
| Port +--Link--
+---------------------+

We can't use the Port type to determine which side the Link is on, so add a
bit in struct pci_dev to keep track.

A Root Port's Link is always on the Port's secondary side. A component
(Endpoint or Port) on the other end of the Link obviously has the Link on
its upstream side. If that component is a Port, it is part of a Switch or
a Bridge. A Bridge has a PCI or PCI-X bus on its secondary side, not a
Link. The internal bus of a Switch connects the Port to another Port whose
Link is on the downstream side.

[bhelgaas: changelog, comment, cache "type", use if/else]
Link: http://lkml.kernel.org/r/54EB81B2.4050904@pobox.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94361
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 1851617c 07-May-2015 Michael S. Tsirkin <mst@redhat.com>

PCI/MSI: Disable MSI at enumeration even if kernel doesn't support MSI

If we enable MSI, then kexec a new kernel, the new kernel may receive MSIs
it is not prepared for. Commit d5dea7d95c48 ("PCI: msi: Disable msi
interrupts when we initialize a pci device") prevents this, but only if the
new kernel is built with CONFIG_PCI_MSI=y.

Move the "disable MSI" functionality from drivers/pci/msi.c to a new
pci_msi_setup_pci_dev() in drivers/pci/probe.c so we can disable MSIs when
we enumerate devices even if the kernel doesn't include full MSI support.

[bhelgaas: changelog, disable MSIs in pci_setup_device(), put
pci_msi_setup_pci_dev() at its final destination]
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e6b29dea 08-Apr-2015 Ray Jui <rjui@broadcom.com>

PCI: Export symbols required for loadable host driver modules

Export the following symbols so they can be referenced by a PCI host bridge
driver compiled as a kernel loadable module:

pci_common_swizzle
pci_create_root_bus
pci_stop_root_bus
pci_remove_root_bus
pci_assign_unassigned_bus_resources
pci_fixup_irqs

Signed-off-by: Ray Jui <rjui@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>


# b97ea289 15-Mar-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Assign resources before drivers claim devices (pci_scan_root_bus())

Previously, pci_scan_root_bus() created a root PCI bus, enumerated the
devices on it, and called pci_bus_add_devices(), which made the devices
available for drivers to claim them.

Most callers assigned resources to devices after pci_scan_root_bus()
returns, which may be after drivers have claimed the devices. This is
incorrect; the PCI core should not change device resources while a driver
is managing the device.

Remove pci_bus_add_devices() from pci_scan_root_bus() and do it after any
resource assignment in the callers.

Note that ARM's pci_common_init_dev() already called pci_bus_add_devices()
after pci_scan_root_bus(), so we only need to remove the first call:

pci_common_init_dev
pcibios_init_hw
pci_scan_root_bus
pci_bus_add_devices # first call
pci_bus_assign_resources
pci_bus_add_devices # second call

[bhelgaas: changelog, drop "root_bus" var in alpha common_init_pci(),
return failure earlier in mn10300, add "return" in x86 pcibios_scan_root(),
return early if xtensa platform_pcibios_fixup() fails]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Matt Turner <mattst88@gmail.com>
CC: David Howells <dhowells@redhat.com>
CC: Tony Luck <tony.luck@intel.com>
CC: Michal Simek <monstr@monstr.eu>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Chris Metcalf <cmetcalf@ezchip.com>
CC: Chris Zankel <chris@zankel.net>
CC: Max Filippov <jcmvbkbc@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>


# c90570d9 08-Mar-2015 Yijing Wang <wangyijing@huawei.com>

PCI: Assign resources before drivers claim devices (pci_scan_bus())

Previously, pci_scan_bus() created a root PCI bus, enumerated the devices
on it, and called pci_bus_add_devices(), which made the devices available
for drivers to claim them.

Most callers assigned resources to devices after pci_scan_bus() returns,
which may be after drivers have claimed the devices. This is incorrect;
the PCI core should not change device resources while a driver is managing
the device.

Remove pci_bus_add_devices() from pci_scan_bus() and do it after any
resource assignment in the callers.

[bhelgaas: changelog, check for failure in mcf_pci_init()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Guan Xuetao <gxt@mprc.pku.edu.cn>
CC: Richard Henderson <rth@twiddle.net>
CC: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
CC: Matt Turner <mattst88@gmail.com>


# de335bb4 02-Mar-2015 Murali Karicheri <m-karicheri2@ti.com>

PCI: Update DMA configuration from DT

If there is a DT node available for the root bridge's parent device, use
the DMA configuration from that device node. For example, Keystone PCI
devices would require dma_pfn_offset to be set correctly in the device
structure of the PCI device in order to have the correct DMA mask. The DT
node will have dma-ranges defined for this. Also support using the DT
property dma-coherent to allow coherent DMA operation by the PCI device.

Use the new helper function of_pci_dma_configure() to update the device DMA
configuration. This fixes DMA on systems where DMA addresses are a
constant offset from CPU physical addresses.

Tested-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> (AMD Seattle)
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
CC: Joerg Roedel <joro@8bytes.org>
CC: Grant Likely <grant.likely@linaro.org>
CC: Rob Herring <robh+dt@kernel.org>
CC: Russell King <linux@arm.linux.org.uk>
CC: Arnd Bergmann <arnd@arndb.de>


# 14d76b68 04-Feb-2015 Jiang Liu <jiang.liu@linux.intel.com>

PCI: Use common resource list management code instead of private implementation

Use common resource list management data structure and interfaces
instead of private implementation.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7e79c5f8 30-Oct-2014 Myron Stowe <myron.stowe@redhat.com>

PCI: Add informational printk for invalid BARs

As a consequence of restoring the detection of invalid BARs, add a new
informational printk like the following when such occurrences are
encountered.

pci ssss:bb:dd.f: [Firmware Bug]: reg 0xXX: invalid BAR (can't size)

Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>


# 7fc986d8 19-Nov-2014 Yinghai Lu <yinghai@kernel.org>

PCI: Support 64-bit bridge windows if we have 64-bit dma_addr_t

Aaron reported that a 32-bit x86 kernel with Physical Address Extension
(PAE) support complains about bridge prefetchable memory windows above 4GB:

pci_bus 0000:00: root bus resource [mem 0x380000000000-0x383fffffffff]
...
pci 0000:03:00.0: reg 0x10: [mem 0x383fffc00000-0x383fffdfffff 64bit pref]
pci 0000:03:00.0: reg 0x20: [mem 0x383fffe04000-0x383fffe07fff 64bit pref]
pci 0000:03:00.1: reg 0x10: [mem 0x383fffa00000-0x383fffbfffff 64bit pref]
pci 0000:03:00.1: reg 0x20: [mem 0x383fffe00000-0x383fffe03fff 64bit pref]
pci 0000:00:02.2: PCI bridge to [bus 03-04]
pci 0000:00:02.2: bridge window [io 0x1000-0x1fff]
pci 0000:00:02.2: bridge window [mem 0x91900000-0x91cfffff]
pci 0000:00:02.2: can't handle 64-bit address space for bridge

In this kernel, unsigned long is 32 bits and dma_addr_t is 64 bits.
Previously we used "unsigned long" to hold the bridge window address. But
this is a bus address, so we should use dma_addr_t instead.

Use dma_addr_t to hold the bridge window base and limit.

The question of whether the CPU can actually *address* the window is
separate and depends on what the physical address space of the CPU is and
whether the host bridge does any address translation.

[bhelgaas: fix "shift count > width of type", changelog, stable tag]
Fixes: d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=88131
Reported-by: Aaron Ma <mapengyu@gmail.com>
Tested-by: Aaron Ma <mapengyu@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.14+


# 7a1562d4 11-Nov-2014 Yinghai Lu <yinghai@kernel.org>

PCI: Apply _HPX Link Control settings to all devices with a link

Previously we applied _HPX type 2 record Link Control register settings
only to bridges with a subordinate bus. But it's better to apply them to
all devices with a link because if the subordinate bus has not been
allocated yet, we won't apply settings to the device.

Use pcie_cap_has_lnkctl() to determine whether the device has a Link
Control register instead of looking at dev->subordinate.

[bhelgaas: changelog]
Fixes: 6cd33649fa83 ("PCI: Add pci_configure_device() during enumeration")
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ff0387c3 10-Nov-2014 Markus Elfring <elfring@users.sourceforge.net>

PCI: Delete unnecessary NULL pointer checks

The functions pci_dev_put(), pci_pme_wakeup_bus(), and put_device() return
immediately if their argument is NULL. Thus the test before the call is
not needed.

Remove these unnecessary tests.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f795d86a 30-Oct-2014 Myron Stowe <myron.stowe@redhat.com>

PCI: Shrink decoding-disabled window while sizing BARs

__pci_read_base() disables decoding while sizing device BARs. We can't
print while decoding is disabled, which leads to some rather messy exit
logic.

Coalesce the sizing logic to minimize the time decoding is disabled. This
lets us print errors where they're detected.

The refactoring also takes advantage of the symmetry of obtaining the BAR's
extent (pci_size) and storing the result as the 'region' for both the
32-bit and 64-bit BARs, consolidating both cases.

No functional change intended.

[bhelgaas: move pci_size() up, per Thomas Petazzoni, Thierry Reding, Kevin Hilman]
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>


# 36e81648 30-Oct-2014 Myron Stowe <myron.stowe@redhat.com>

PCI: Restore detection of read-only BARs

Commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") masked off
low-order bits from 'l', but not from 'sz'. Both are passed to pci_size(),
which compares 'base == maxbase' to check for read-only BARs. The masking
of 'l' means that comparison will never be 'true', so the check for
read-only BARs no longer works.

Resolve this by also masking off the low-order bits of 'sz' before passing
it into pci_size() as 'maxbase'. With this change, pci_size() will once
again catch the problems that have been encountered to date:

- AGP aperture BAR of AMD-7xx host bridges: if the AGP window is
disabled, this BAR is read-only and read as 0x00000008 [1]

- BARs 0-4 of ALi IDE controllers can be non-zero and read-only [1]

- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [2]

- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [3]

Link: [1] https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9 ("PCI: probing read-only BARs" (pre-git))
Link: [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
Link: [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991
Reported-by: William Unruh <unruh@physics.ubc.ca>
Reported-by: Martin Lucina <martin@lucina.net>
Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Matthew Wilcox <willy@linux.intel.com>
CC: stable@vger.kernel.org # v2.6.27+


# 670ba0c8 29-Sep-2014 Catalin Marinas <catalin.marinas@arm.com>

PCI: Add generic domain handling

The handling of PCI domains (or PCI segments in ACPI speak) is usually a
straightforward affair but its implementation is currently left to the
architectural code, with pci_domain_nr(b) querying the value of the domain
associated with bus b.

This patch introduces CONFIG_PCI_DOMAINS_GENERIC as an option that can be
selected if an architecture wants a simple implementation where the value
of the domain associated with a bus is stored in struct pci_bus.

The architectures that select CONFIG_PCI_DOMAINS_GENERIC will then have to
implement pci_bus_assign_domain_nr() as a way of setting the domain number
associated with a root bus. All child buses except the root bus will
inherit the domain_nr value from their parent.

Signed-off-by: Catalin Marinas <Catalin.Marinas@arm.com>
[Renamed pci_set_domain_nr() to pci_bus_assign_domain_nr()]
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: Arnd Bergmann <arnd@arndb.de>


# 12d87069 19-Sep-2014 Bjorn Helgaas <bhelgaas@google.com>

Revert "PCI: Make sure bus number resources stay within their parents bounds"

This reverts commit 1820ffdccb9b ("PCI: Make sure bus number resources stay
within their parents bounds") because it breaks some systems with LSI Logic
FC949ES Fibre Channel Adapters, apparently by exposing a defect in those
adapters.

Dirk tested a Tyan VX50 (B4985) with this device that worked like this
prior to 1820ffdccb9b:

bus: [bus 00-7f] on node 0 link 1
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07])
pci 0000:00:0e.0: PCI bridge to [bus 0a]
pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07])
pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter)

Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS
defect in the PCI0 _CRS method. But prior to 1820ffdccb9b, we didn't
enforce that aperture, and the FC adapter worked fine at 0a:00.0.

After 1820ffdccb9b, we notice that 00:0e.0's aperture is not contained in
the root bridge's aperture, so we reconfigure it so it *is* contained:

pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring
pci 0000:00:0e.0: PCI bridge to [bus 06-07]

This effectively moves the FC device from 0a:00.0 to 07:00.0, which should
be legal. But when we enumerate bus 06, the FC device doesn't respond, so
we don't find anything. This is probably a defect in the FC device.

Possible fixes (due to Yinghai):

1) Add a quirk to fix the _CRS information based on what amd_bus.c read
from the hardware

2) Reset the FC device after we change its bus number

3) Revert 1820ffdccb9b

Fix 1 would be relatively easy, but it does sweep the LSI FC issue under
the rug. We might want to reconfigure bus numbers in the future for some
other reason, e.g., hotplug, and then we could trip over this again.

For that reason, I like fix 2, but we don't know whether it actually works,
and we don't have a patch for it yet.

This revert is fix 3, which also sweeps the LSI FC issue under the rug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281
Reported-by: Dirk Gouders <dirk@gouders.net>
Tested-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.15+
CC: Yinghai Lu <yinghai@kernel.org>


# 7a0b33d4 19-Sep-2014 Bjorn Helgaas <bhelgaas@google.com>

Revert "PCI: Don't scan random busses in pci_scan_bridge()"

This reverts commit fc1b253141b3 ("PCI: Don't scan random busses in
pci_scan_bridge()") because it breaks CardBus on some machines.

David tested a Dell Latitude D505 that worked like this prior to
fc1b253141b3:

pci 0000:00:1e.0: PCI bridge to [bus 01]
pci 0000:01:01.0: CardBus bridge to [bus 02-05]

Note that the 01:01.0 CardBus bridge has a bus number aperture of
[bus 02-05], but those buses are all outside the 00:1e.0 PCI bridge bus
number aperture, so accesses to buses 02-05 never reach CardBus. This is
later patched up by yenta_fixup_parent_bridge(), which changes the
subordinate bus number of the 00:1e.0 PCI bridge:

pci_bus 0000:01: Raising subordinate bus# of parent bus (#01) from #01 to #05

With fc1b253141b3, pci_scan_bridge() fails immediately when it notices that
we can't allocate a valid secondary bus number for the CardBus bridge, and
CardBus doesn't work at all:

pci 0000:01:01.0: can't allocate child bus 01 from [bus 01]

I'd prefer to fix this by integrating the yenta_fixup_parent_bridge() logic
into pci_scan_bridge() so we fix the bus number apertures up front. But
I don't think we can do that before v3.17, so I'm going to revert this to
avoid the problem while we're working on the long-term fix.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=83441
Link: http://lkml.kernel.org/r/1409303414-5196-1-git-send-email-david.henningsson@canonical.com
Reported-by: David Henningsson <david.henningsson@canonical.com>
Tested-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.15+


# 1302fcf0 30-Aug-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Configure *all* devices, not just hot-added ones

There's not really a good way to determine whether firmware has already
configured a device with _HPP/_HPX settings. On legacy systems, the BIOS
has probably configured everything, but on UEFI systems it is not required
to do so.

Per the PCI Firmware Specification, rev 3.1, sec 3.5, if PCI_COMMAND_IO or
PCI_COMMAND_MEMORY is set, we can assume firmware has set the corresponding
BARs and maybe we can assume it has configured the rest of the device. And
if a bridge has PCI_COMMAND_PARITY or PCI_COMMAND_SERR set, we can assume
firmware has configured the bridge. But we can't tell much about devices
without BARs.

I think it should be safe to apply _HPP and _HPX settings anyway, even if
firmware has already configured the device, so configure everything we
find.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# 302328c0 03-Sep-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Preserve MPS and MRRS when applying _HPX settings

Linux manages MPS and MRRS settings to keep them consistent across the PCIe
fabric. BIOS doesn't participate in this Linux management, so ignore that
part of any _HPX settings it supplies.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# ca0647e0 30-Aug-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Apply _HPP settings to all hot-added PCI devices

We currently apply _HPP settings only to:

- non-bridge devices, and
- PCI-to-PCI bridges

i.e., we do not apply them to PCI-to-ISA bridges and the like. It has been
that way since _HPP support was added by 40abb96c51bb ("pciehp: Fix
programming hotplug parameters"), but I don't think there's any reason to
exclude these other bridges.

Apply _HPP settings to hot-added PCI devices of any type.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# eab3a0ee 27-Aug-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Preserve BIOS PCI_COMMAND_SERR and PCI_COMMAND_PARITY settings

Do not clear PCI_COMMAND_SERR or PCI_COMMAND_PARITY based on _HPP. The
spec (ACPI rev 5.0, sec 6.2.7) says that when "Enable SERR" is set to 1,
we should enable SERR in the command register. It says nothing about
*disabling* SERR or PERR; in fact, the example in 6.2.7.1 says we should
leave PERR alone unless "Enable PERR" is 1.

For hot-added devices, this probably doesn't matter because they power up
with these bits cleared. But in addition to hot-plugged devices, the spec
allows the platform to use _HPP for "configuration of PCI devices not
configured by the BIOS at system boot," and it may make a difference for
devices present at boot.

This change means that if BIOS enables SERR or PERR on a device, and it
supplies _HPP or _HPX with the SERR or PERR bits *cleared*, we will now
leave SERR or PERR reporting enabled on that device instead of disabling it
as we previously did.

See also 40abb96c51bb ("pciehp: Fix programming hotplug parameters"), where
this code was first added.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# c6285fc5 29-Aug-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Apply _HPP settings to PCIe devices as well as PCI and PCI-X

The ACPI _HPP method was defined before PCIe existed, so its documentation
only mentions PCI. The _HPX Type 0 setting record is essentially identical
to _HPP, but the spec (ACPI rev 5.0, sec 6.2.8.1) says it should be applied
to PCI, PCI-X, and PCIe devices, with settings being ignored if they are
not applicable.

Some platforms with both conventional PCI and PCIe devices provide only
_HPP (not _HPX), so treat _HPP the same way as an _HPX Type 0 record and
apply it to PCIe devices as well as PCI and PCI-X.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# fbfa398b 28-Aug-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Remove unused pci_configure_slot()

All pci_configure_slot() uses have been removed, so remove the definition
as well.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# 6cd33649 27-Aug-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Add pci_configure_device() during enumeration

Some platforms can tell the OS how to configure PCI devices, e.g., how to
set cache line size, error reporting enables, etc. ACPI defines _HPP and
_HPX methods for this purpose.

This configuration was previously done by some of the hotplug drivers using
pci_configure_slot(). But not all hotplug drivers did this, and per the
spec (ACPI rev 5.0, sec 6.2.7), we can also do it for "devices not
configured by the BIOS at system boot."

Move this configuration into the PCI core by adding pci_configure_device()
and calling it from pci_device_add(), so we do this for all devices as we
enumerate them.

This is based on pci_configure_slot(), which is used by hotplug drivers.
I omitted:

- pcie_bus_configure_settings() because it configures MPS and MRRS, which
requires global knowledge of the fabric and must be done later, and

- configuration of subordinate devices; that will happen when we call
pci_device_add() for those devices.

Because pci_configure_slot() was only done by hotplug drivers, this initial
version of pci_configure_device() only configures hot-added devices,
ignoring anything added during boot.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>


# 589fcc23 12-Sep-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Move pci_configure_slot() to drivers/pci/probe.c

Move pci_configure_slot() and related functions from
drivers/pci/hotplug/pcihp_slot to drivers/pci/probe.c.

This is to prepare for doing device configuration during the normal
enumeration process instead of just after hot-add.

No functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f3dbd802 02-Sep-2014 Rajat Jain <rajatxjain@gmail.com>

PCI: Enable CRS Software Visibility for root port if it is supported

Per PCIe r3.0, sec 2.3.2, an endpoint may respond to a Configuration
Request with a Completion with Configuration Request Retry Status (CRS).
This terminates the Configuration Request.

When the CRS Software Visibility feature is disabled (as it is by default),
a Root Complex must handle a CRS Completion by re-issuing the Configuration
Request. This is invisible to software. From the CPU's point of view, an
endpoint that always responds with CRS causes a hang because the Root
Complex never supplies data to complete the CPU read.

When CRS Software Visibility is enabled, a Root Complex that receives a CRS
Completion for a read of the Vendor ID must return data of 0x0001. The
Vendor ID of 0x0001 indicates to software that the endpoint is not ready.

We now have more devices that require CRS Software Visibility. For
example, a PLX 8713 NT bridge may respond with CRS until it has been
configured via I2C, and the I2C configuration is completely independent of
PCI enumeration.

Enable CRS Software Visibility if it is supported. This allows a system
with such a device to work (though the PCI core times out waiting for it to
become ready, and we have to rescan the bus after it is ready).

This essentially reverts ad7edfe04908 ("[PCI] Do not enable CRS Software
Visibility by default"). The failures that led to ad7edfe04908 should be
addressed by 89665a6a7140 ("PCI: Check only the Vendor ID to identify
Configuration Request Retry").

[bhelgaas: changelog]
Link: http://lkml.kernel.org/r/20071029061532.5d10dfc6@snowcone
Link: http://lkml.kernel.org/r/alpine.LFD.0.9999.0712271023090.21557@woody.linux-foundation.org
Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 89665a6a 08-Sep-2014 Rajat Jain <rajatxjain@gmail.com>

PCI: Check only the Vendor ID to identify Configuration Request Retry

Per PCIe r3.0, sec 2.3.2, if a Root Complex

- has Configuration Request Retry Status Software Visibility enabled,
- issues a Configuration Read of both bytes of the Vendor ID, and
- receives a Completion with Configuration Request Retry Status (CRS),

it must complete the request to the host by fabricating data of 0x0001 for
the Vendor ID and 0xff for any additional bytes in the request.

Linux issues a single config read for the four bytes containing the Vendor
ID and the Device ID. Previously we checked all four bytes for 0xffff0001
to identify CRS.

However, it is only the Vendor ID that really indicates CRS, because it's
sufficient to read only those two bytes. Checking the Device ID verifies
spec compliance but doesn't add any information.

Some Root Complexes appear to indicate CRS by returning 0x0001 for the
Vendor ID along with the actual the Device ID. Previously we interpreted
that as a valid Vendor/Device ID pair, although 0x0001 is reserved and
cannot be a valid Vendor ID.

[bhelgaas: changelog]
Link: http://lkml.kernel.org/r/4729FC36.3040000@gmail.com
Signed-off-by: Rajat Jain <rajatxjain@gmail.com>
Signed-off-by: Rajat Jain <rajatjain@juniper.net>
Signed-off-by: Guenter Roeck <groeck@juniper.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 227f0647 18-Apr-2014 Ryan Desfosses <ryan@desfo.org>

PCI: Merge multi-line quoted strings

Merge quoted strings that are broken across lines into a single entity.
The compiler merges them anyway, but checkpatch complains about it, and
merging them makes it easier to grep for strings.

No functional change.

[bhelgaas: changelog, do the same for everything under drivers/pci]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3c78bc61 18-Apr-2014 Ryan Desfosses <ryan@desfo.org>

PCI: Whitespace cleanup

Fix various whitespace errors.

No functional change.

[bhelgaas: fix other similar problems]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b7fe9434 25-Apr-2014 Ryan Desfosses <ryan@desfo.org>

PCI: Move EXPORT_SYMBOL so it immediately follows function/variable

Move EXPORT_SYMBOL so it immediately follows the function or variable.

No functional change.

[bhelgaas: squash similar changes, fix hotplug, probe, rom, search, too]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 782a985d 20-May-2014 Alex Williamson <alex.williamson@redhat.com>

PCI: Introduce new device binding path using pci_dev.driver_override

The driver_override field allows us to specify the driver for a device
rather than relying on the driver to provide a positive match of the
device. This shortcuts the existing process of looking up the vendor and
device ID, adding them to the driver new_id, binding the device, then
removing the ID, but it also provides a couple advantages.

First, the above existing process allows the driver to bind to any device
matching the new_id for the window where it's enabled. This is often not
desired, such as the case of trying to bind a single device to a meta
driver like pci-stub or vfio-pci. Using driver_override we can do this
deterministically using:

echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

Previously we could not invoke drivers_probe after adding a device to
new_id for a driver as we get non-deterministic behavior whether the driver
we intend or the standard driver will claim the device. Now it becomes a
deterministic process, only the driver matching driver_override will probe
the device.

To return the device to the standard driver, we simply clear the
driver_override and reprobe the device:

echo > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe

Another advantage to this approach is that we can specify a driver override
to force a specific binding or prevent any binding. For instance when an
IOMMU group is exposed to userspace through VFIO we require that all
devices within that group are owned by VFIO. However, devices can be
hot-added into an IOMMU group, in which case we want to prevent the device
from binding to any driver (override driver = "none") or perhaps have it
automatically bind to vfio-pci. With driver_override it's a simple matter
for this field to be set internally when the device is first discovered to
prevent driver matches.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 78916b00 05-May-2014 Alex Williamson <alex.williamson@redhat.com>

PCI: Test for std config alias when testing extended config space

When a PCI-to-PCIe bridge is stacked on a PCIe-to-PCI bridge, we can have
PCIe endpoints masked by a conventional PCI bus. This makes the extended
config space of the PCIe endpoint inaccessible. The PCIe-to-PCI bridge is
supposed to handle any type 1 configuration transactions where the extended
config offset bits are non-zero as an Unsupported Request rather than
forward it to the secondary interface. As noted here, there are a couple
known offenders to this rule. These bridges drop the extended offset bits,
resulting in the conventional config space being aliased many times across
the extended config space. For Intel NICs, this alias often seems to
expose a bogus SR-IOV cap.

Stacking bridges may seem like an uncommon scenario, but note that any
conventional PCI slot in a modern PC is already the secondary interface of
an onboard PCIe-to-PCI bridge. The user need only add a PCI-to-PCIe
adapter and PCIe device to encounter this problem.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6788a51f 03-May-2014 Yijing Wang <wangyijing@huawei.com>

PCI: Use pci_is_bridge() to simplify code

Use pci_is_bridge() to simplify code. No functional change.

Requires: 326c1cdae741 PCI: Rename pci_is_bridge() to pci_has_subordinate()
Requires: 1c86438c9423 PCI: Add new pci_is_bridge() interface
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d739a099 14-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Don't add disabled subtractive decode bus resources

For a subtractive decode bridge, we previously added and printed all
resources of the primary bus, even if they were not valid. In the example
below, the bridge 00:1c.3 has no windows enabled, so there are no valid
resources on bus 02. But since 02:00.0 is subtractive decode bridge, we
add and print all those invalid resources, which don't really make sense:

pci 0000:00:1c.3: PCI bridge to [bus 02-03]
pci 0000:02:00.0: PCI bridge to [bus 03] (subtractive decode)
pci 0000:02:00.0: bridge window [??? 0x00000000 flags 0x0] (subtractive decode)

Add and print the subtractively-decoded resources only if they are valid.

There's an example in the dmesg log attached to the bugzilla below (but
this patch doesn't fix the bug reported there).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=73141
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 26370fc6 14-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Don't print anything while decoding is disabled

If the console is a PCI device, and we try to print to it while its
decoding is disabled, the system will hang. This particular printk hasn't
caused a problem yet, but it could, so this fixes it.

See also 0ff9514b579b ("PCI: Don't print anything while decoding is
disabled").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 31e9dd25 29-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Don't set BAR to zero if dma_addr_t is too small

If a BAR is above 4GB and our dma_addr_t is too small, don't clear the BAR
to zero: that doesn't disable the BAR, and it makes it more likely that the
BAR will conflict with things if we turn on the memory enable bit (as we
will at "out:" if the device was already enabled at the handoff).

We should also print the BAR info and its original size so we can follow
the process when we try to assign space to it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 72dc5601 29-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Don't convert BAR address to resource if dma_addr_t is too small

If dma_addr_t is too small to represent the BAR value,
pcibios_bus_to_resource() will fail, so just remember the BAR size directly
in the resource. The resource is already marked UNSET, so we know the
address isn't valid anyway.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d1a313e4 29-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Reject BAR above 4GB if dma_addr_t is too small

We can only handle BARs above 4GB if dma_addr_t (not resource_size_t) is 64
bits wide. If we have a 64-bit resource_size_t and a 32-bit dma_addr_t,
we can't deal with BARs above 4GB.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 23b13bc7 14-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Fail safely if we can't handle BARs larger than 4GB

We can only handle BARs larger than 4GB if both dma_addr_t and
resource_size_t are 64 bits wide. If dma_addr_t is 32 bits, we can't
represent all the bus addresses, and if resource_size_t is 32 bits, we
can't represent all the CPU addresses.

Previously we cleared res->flags (at "fail:") for resources that were too
large. That means we think the BAR doesn't exist at all, which in turn
means that we could enable the device even though we can't keep track of
where the BAR is and we can't make sure it doesn't overlap something else.

This preserves the type flags (MEM/IO) so we can keep from enabling the
device.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 1e358f94 29-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Fix use of uninitialized MPS value

If "pcie_bus_config == PCIE_BUS_PERFORMANCE", we don't initialize "smpss",
so we pass a pointer to garbage into pcie_bus_configure_set(), where we
compute "mps" based on the garbage. We then pass the garbage "mps" to
pcie_write_mps(), which ignores it in the PCIE_BUS_PERFORMANCE case.

Coverity isn't smart enough to deduce that we ignore the garbage (it's a
lot to expect from a human, too), so initialize "smpss" to a safe value in
all cases.

Found by Coverity (CID 146454).

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 10874f5a 14-Apr-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Remove unnecessary __ref annotations

Some PCI functions used to be marked __devinit. When CONFIG_HOTPLUG was
not set, these functions were discarded after boot. A few callers of these
__devinit functions were marked __ref to indicate that they could safely
call the __devinit functions even though the callers were not __devinit.

But CONFIG_HOTPLUG and __devinit are now gone, and the need for the __ref
annotations is also gone, so remove them. Relevant historical commits:

54b956b90360 Remove __dev* markings from init.h
a8e4b9c101ae PCI: add generic pci_hp_add_bridge()
0ab2b57f8db8 PCI: fix section mismatch warning in pci_scan_child_bus
451124a7cc6c PCI: fix 4x section mismatch warnings

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 075eb9e3 05-Mar-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Log IDE resource quirk in dmesg

Make a note in dmesg when we overwrite legacy IDE BAR info. We previously
logged something like this:

pci 0000:00:1f.1: reg 0x10: [io 0x0000-0x0007]

and then silently overwrote the resource. There's an example in the
bugzilla below. This doesn't fix the bugzilla; it just makes what's going
on more obvious.

No functional change; merely adds some dev_info() calls.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=48451
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c83bd900 26-Feb-2014 Bjorn Helgaas <bhelgaas@google.com>

PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit

If we don't support 64-bit addresses, i.e., CONFIG_PHYS_ADDR_T_64BIT is not
set, we can't deal with BARs above 4GB. In this case we already pretend
the BAR contained zero; this patch also sets IORESOURCE_UNSET so we can try
to reallocate it later.

I don't think this is exactly correct: what we care about here are *bus*
addresses, not CPU addresses, so the tests of sizeof(resource_size_t)
probably should be on sizeof(dma_addr_t) instead. But this is what's been
in -next, so we'll fix that later.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# fc1b2531 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Don't scan random busses in pci_scan_bridge()

When assigning a new bus number in pci_scan_bridge we check whether
max+1 is free by calling pci_find_bus. If it does already exist then we
assume that we are rescanning and that this is the right bus to scan.

This is fragile. If max+1 lies outside of bus->busn_res.end then we will
rescan some random bus from somewhere else in the hierachy. This patch
checks for this case and prints a warning.

[bhelgaas: add parent/child bus number info to dev_warn()]
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c95b0bd6 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Check for child busses which use more bus numbers than allocated

pci_scan_child_bus can (potentially) return a bus number higher than the
subordinate value of the child bus. Possible reasons are that bus numbers
are reserved for SR-IOV or for CardBus (SR-IOV is done without checks and
the CardBus checks are sketchy at best).

We clamp the returned value to the actual subordinate value and print a
warning if too many bus numbers are reserved.

[bhelgaas: whitespace]
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f5fb4070 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Remove pci_fixup_parent_subordinate_busnr()

The function has no effect.

If pcibios_assign_all_busses() is not set then the function does nothing.

If it is set then in pci_scan_bridge we are always in the branch where
we assign the bus numbers ourselves and the subordinate values of all
parent busses will be set to 0xff since that is what they inherited from
their parent bus and ultimately from the root bus.

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 1820ffdc 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Make sure bus number resources stay within their parents bounds

Right now we use 0xff for busn_res.end when probing and later reduce it to
the value that is actually used. This does not work if a parent bridge has
already a lower subordinate value. For example during hotplug of a new
bridge below an already-configured bridge the following message is printed
from pci_bus_insert_busn_res():

pci_bus 0000:06: busn_res: can not insert [bus 06-ff] under [bus 05-9b] (conflicts with (null) [bus 05-9b])

This patch clamps the bus range to that of the parent and also ensures that
we do not exceed the parents range when assigning the final subordinate
value.

We also check that busses configured by the firmware fit into their parents
bounds.

[bhelgaas: reword dev_warn() and fix printk format warning]
Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ced04d15 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Use request_resource_conflict() instead of insert_ for bus numbers

If a conflict happens during insert_resource_conflict() and all conflicts
fit within the newly inserted resource then they will become children of
the new resource. This is almost certainly not what we want for bus
numbers.

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 619c8c31 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Assign CardBus bus number only during the second pass

Right now the CardBus code in pci_scan_bridge() is executed during both
passes. Since we always allocate the bus number ourselves it makes sense
to put it into the second pass.

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 2ed85823 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Clarify the "scan anyway" comment in pci_scan_bridge()

Initially when we encountered a bus that was already present we skipped
it. Since 74710ded8e16 'PCI: always scan child buses' we continue
scanning in order to allow user triggered rescans of already existing
busses.

The old comment suggested that the reason for continuing the scan is a
bug in the i450NX chipset. This is not the case.

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9a4d7d87 23-Jan-2014 Andreas Noever <andreas.noever@gmail.com>

PCI: Increment max correctly in pci_scan_bridge()

This patch fixes two small issues:
- If pci_add_new_bus() fails, max must not be incremented. Otherwise
an incorrect value is returned from pci_scan_bridge().
- If the bus is already present, max must be incremented. I think
that this case should only be hit if we trigger a manual rescan of a
CardBus bridge.

Signed-off-by: Andreas Noever <andreas.noever@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 04480094 01-Feb-2014 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Revert "PCI: Remove from bus_list and release resources in pci_release_dev()"

Revert commit ef83b0781a73 "PCI: Remove from bus_list and release
resources in pci_release_dev()" that made some nasty race conditions
become possible. For example, if a Thunderbolt link is unplugged
and then replugged immediately, the pci_release_dev() resulting from
the hot-remove code path may be racing with the hot-add code path
which after that commit causes various kinds of breakage to happen
(up to and including a hard crash of the whole system).

Moreover, the problem that commit ef83b0781a73 attempted to address
cannot happen any more after commit 8a4c5c329de7 "PCI: Check parent
kobject in pci_destroy_dev()", because pci_destroy_dev() will now
return immediately if it has already been executed for the given
device.

Note, however, that the invocation of msi_remove_pci_irq_vectors()
removed by commit ef83b0781a73 from pci_free_resources() along with
the other changes made by it is not added back because of subsequent
code changes depending on that modification.

Fixes: ef83b0781a73 (PCI: Remove from bus_list and release resources in pci_release_dev())
Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 9d16947b 10-Jan-2014 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

PCI: Add global pci_lock_rescan_remove()

There are multiple PCI device addition and removal code paths that may be
run concurrently with the generic PCI bus rescan and device removal that
can be triggered via sysfs. If that happens, it may lead to multiple
different, potentially dangerous race conditions.

The most straightforward way to address those problems is to run
the code in question under the same lock that is used by the
generic rescan/remove code in pci-sysfs.c. To prepare for those
changes, move the definition of the global PCI remove/rescan lock
to probe.c and provide global wrappers, pci_lock_rescan_remove()
and pci_unlock_rescan_remove(), allowing drivers to manipulate
that lock. Also provide pci_stop_and_remove_bus_device_locked()
for the callers of pci_stop_and_remove_bus_device() who only need
to hold the rescan/remove lock around it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 0b950f0f 10-Jan-2014 Stephen Hemminger <stephen@networkplumber.org>

PCI: Make local functions static

Using 'make namespacecheck' identify code which should be declared static.
Checked for users in other driver/archs as well. Compile tested only.

This stops exporting the following interfaces to modules:

pci_target_state()
pci_load_saved_state()

[bhelgaas: retained pci_find_next_ext_capability() and pci_cfg_space_size()]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e2760c54 10-Jan-2014 Stephen Hemminger <stephen@networkplumber.org>

PCI: Remove unused alloc_pci_dev()

My philosophy is unused code is dead code. And dead code is subject to bit
rot and is a likely source of bugs. Use it or lose it.

This removes this unused and deprecated interface:

alloc_pci_dev()

[bhelgaas: split to separate patch]
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# fc279850 09-Dec-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Convert pcibios_resource_to_bus() to take a pci_bus, not a pci_dev

These interfaces:

pcibios_resource_to_bus(struct pci_dev *dev, *bus_region, *resource)
pcibios_bus_to_resource(struct pci_dev *dev, *resource, *bus_region)

took a pci_dev, but they really depend only on the pci_bus. And we want to
use them in resource allocation paths where we have the bus but not a
device, so this patch converts them to take the pci_bus instead of the
pci_dev:

pcibios_resource_to_bus(struct pci_bus *bus, *bus_region, *resource)
pcibios_bus_to_resource(struct pci_bus *bus, *resource, *bus_region)

In fact, with standard PCI-PCI bridges, they only depend on the host
bridge, because that's the only place address translation occurs, but
we aren't going that far yet.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ef83b078 30-Nov-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Remove from bus_list and release resources in pci_release_dev()

Previously we removed the pci_dev from the bus_list and released its
resources in pci_destroy_dev(). But that's too early: it's possible to
call pci_destroy_dev() twice for the same device (e.g., via sysfs), and
that will cause an oops when we try to remove it from bus_list the second
time.

We should remove it from the bus_list only when the last reference to the
pci_dev has been released, i.e., in pci_release_dev().

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ef37702e 30-Nov-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Move pci_proc_attach_device() to pci_bus_add_device()

4f535093cf8f ("PCI: Put pci_dev in device tree as early as possible")
moved pci_proc_attach_device() from pci_bus_add_device() to
pci_device_add().

This moves it back to pci_bus_add_device(), essentially reverting that
part of 4f535093cf8f. This makes it symmetric with pci_stop_dev(),
where we call pci_proc_detach_device() and pci_remove_sysfs_dev_files()
and set dev->is_added = 0.

[bhelgaas: changelog, create sysfs then attach proc for symmetry]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f7625980 14-Nov-2013 Bjorn Helgaas <bhelgaas@google.com>

PCI: Fix whitespace, capitalization, and spelling errors

Fix whitespace, capitalization, and spelling errors. No functional change.
I know "busses" is not an error, but "buses" was more common, so I used it
consistently.

Signed-off-by: Marta Rybczynska <rybczynska@gmail.com> (pci_reset_bridge_secondary_bus())
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 115e3bc5 02-Sep-2013 Yijing Wang <wangyijing@huawei.com>

PCI: Remove unused "is_pcie" from pci_dev structure

No one uses "is_pcie" now; remove this obsolete member.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# fdfe1511 05-Sep-2013 Yijing Wang <wangyijing@huawei.com>

PCI: Use pci_is_pcie() to simplify code

Use pci_is_pcie() instead of pci_find_capability() to simplify code.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5895af79 26-Aug-2013 Yijing Wang <wangyijing@huawei.com>

PCI: Warn if unsafe MPS settings detected

If a BIOS configures MPS incorrectly, devices may not work normally.
For example, if a bridge has MPS set larger than an endpoint below it,
the endpoint may discard packets.

To help diagnose this issue, print a warning if we find an endpoint
MPS setting different than that of the upstream bridge.

[bhelgaas: changelog, "bridge" temporary, warning text]
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=60799
Reported-by: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jon Mason <jdmason@kudzu.us>


# 3315472c 26-Aug-2013 Jon Mason <jdmason@kudzu.us>

PCI: Fix MPS peer-to-peer DMA comment syntax

Correct minor wording issue in MPS peer-to-peer comment. Noticed by Don
Dutile.

Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 808e34e2 22-Aug-2013 Zoltan Kiss <zoltan.kiss@citrix.com>

PCI: Disable decoding for BAR sizing only when it was actually enabled

We disable BARs while sizing them so we don't cause conflicts with other
devices (see 253d2e5498 and bbffe43524). But if device decoding is already
disabled before we size the BAR, we don't need to disable it again.

[bhelgaas: changelog, add PCI_COMMAND_DECODING_ENABLE for readability]
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d4aa68f6 21-Aug-2013 Yijing Wang <wangyijing@huawei.com>

PCI: Don't restrict MPS for slots below Root Ports

When booting with "pci=pcie_bus_safe", we previously limited the
fabric MPS to 128 when we found:

(1) A hotplug-capable Downstream Port ("dev->is_hotplug_bridge &&
pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT"), or

(2) A hotplug-capable Root Port with a slot that was either empty or
contained a multi-function device ("dev->is_hotplug_bridge &&
!list_is_singular(&dev->bus->devices)")

Part (1) is valid, but part (2) is not.

After a hot-add in the slot below a Root Port, we can reconfigure all
MPS values in the fabric below the Root Port because the new device is
the only thing below the Root Port and there are no active drivers.
Therefore, there's no reason to limit the MPS for Root Ports, no
matter what's in the slot.

Test info:

-+-[0000:40]-+-07.0-[0000:46]--+-00.0 Intel 82576 NIC
\-00.1 Intel 82576 NIC

0000:40:07.0 Root Port bridge to [bus 46] (MPS supported=256)
0000:46:00.0 Endpoint (MPS supported=512)
0000:46:00.1 Endpoint (MPS supported=512)

# echo 0 > /sys/bus/pci/slots/7/power
# echo 1 > /sys/bus/pci/slots/7/power
pcieport 0000:40:07.0: PCI-E Max Payload Size set to 256/ 256 (was 256)
pci 0000:46:00.0: PCI-E Max Payload Size set to 256/ 512 (was 128)
pci 0000:46:00.1: PCI-E Max Payload Size set to 256/ 512 (was 128)

Before this change, we set MPS to 128 for the Root Port and both NICs
because the slot contained a multi-function device and

dev->is_hotplug_bridge && !list_is_singular(&dev->bus->devices)

was true. After this change, we set it to 256.

[bhelgaas: changelog, comments, split out upstream bridge check]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Jon Mason <jdmason@kudzu.us>


# c2996948 21-Aug-2013 Bjorn Helgaas <bhelgaas@google.com>

PCI: Simplify MPS test for Downstream Port

PCIe hotplug bridges are always either Root Ports or Downstream Ports. No
other device type can have a PCIe link leading downstream to a slot.

Root Ports don't have an upstream bridge, so "dev->is_hotplug_bridge &&
dev->bus->self" is true if and only if "dev" is a Downstream Port. That
means we can simplify this by looking at the type of "dev" itself, without
looking upstream at all.

No functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a58674ff 21-Aug-2013 Bjorn Helgaas <bhelgaas@google.com>

PCI: Simplify pcie_bus_configure_settings() interface

Based on a patch by Jon Mason (see URL below).

All users of pcie_bus_configure_settings() pass arguments of the form
"bus, bus->self->pcie_mpss". The "mpss" argument is redundant since we
can easily look it up internally. In addition, all callers check
"bus->self" for NULL, which we can also do internally.

This patch simplifies the interface and the callers. No functional change.

Reference: http://lkml.kernel.org/r/1317048850-30728-2-git-send-email-mason@myri.com
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 2c25e34c 21-Aug-2013 Bjorn Helgaas <bhelgaas@google.com>

PCI: Drop "PCI-E" prefix from Max Payload Size message

The conventional spelling is "PCIe", but I think even that is superfluous,
so remove the whole thing.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 0cbdcfcf 09-Aug-2013 Thierry Reding <thierry.reding@avionic-design.de>

PCI: Introduce new MSI chip infrastructure

The new struct msi_chip is used to associated an MSI controller with a
PCI bus. It is automatically handed down from the root to its children
during bus enumeration.

This patch provides default (weak) implementations for the architecture-
specific MSI functions (arch_setup_msi_irq(), arch_teardown_msi_irq()
and arch_msi_check_device()) which check if a PCI device's bus has an
attached MSI chip and forward the call appropriately.

Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Daniel Price <daniel.price@gmail.com>
Tested-by: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>


# 343e51ae 31-Jul-2013 Jacob Keller <jacob.e.keller@intel.com>

PCI: expose pcie_link_speed and pcix_bus_speed arrays

pcie_link_speed and pcix_bus_speed are arrays used by probe.c to correctly
convert lnksta register values into the pci_bus_speed enum. These static arrays
are useful outside probe for this purpose. This patch makes these defines into
conist arrays and exposes them with an extern header in drivers/pci/pci.h

-v2-
* move extern declarations to drivers/pci/pci.h

CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>


# 928bea96 22-Jul-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Delay enabling bridges until they're needed

We currently enable PCI bridges after scanning a bus and assigning
resources. This is often done in arch code.

This patch changes this so we don't enable a bridge until necessary, i.e.,
until we enable a PCI device behind the bridge. We do this in the generic
pci_enable_device() path, so this also removes the arch-specific code to
enable bridges.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 56039e65 24-Jul-2013 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI: Convert class code to use dev_groups

The dev_attrs field of struct class is going away soon, dev_groups
should be used instead. This converts the PCI class code to use the
correct field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 44b9ca47 05-Jun-2013 Sebastian Ott <sebott@linux.vnet.ibm.com>

pci: add pcibios_release_device

Platforms may want to provide architecture-specific functionality when
a pci device is released. Add a pcibios_release_device() call that
architectures can override to do so.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>


# 05013486 05-Jun-2013 Bjorn Helgaas <bhelgaas@google.com>

PCI: Return early on allocation failures to unindent mainline code

On allocation failure, return early so the main body of the function
doesn't have to be indented as the body of an "if" statement. No
functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 70efde2a 07-Jun-2013 Jiang Liu <liuj97@gmail.com>

PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev()

This renames pci_release_bus_bridge_dev() to pci_release_host_bridge_dev()
and moves it next to pci_alloc_host_bridge(). No functional change.

[bhelgaas: split rename & move out of create/destroy symmetry patch]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 343df771 06-Jun-2013 Jiang Liu <liuj97@gmail.com>

PCI: Fix refcount issue in pci_create_root_bus() error recovery path

After calling device_register(&bridge->dev), the bridge is reference-
counted, and it is illegal to call kfree() on it except in the release
function.

[bhelgaas: changelog, use put_device() after device_register() failure]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org


# 8b1fce04 25-May-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)

Use the new pci_alloc_dev(bus) to replace the existing using of
alloc_pci_dev(void).

[bhelgaas: drop pci_bus ref later in pci_release_dev()]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>


# 6ae32c53 04-Jun-2013 Sebastian Ott <sebott@linux.vnet.ibm.com>

PCI: Add pcibios_release_device()

Platforms may want to provide architecture-specific functionality when
a PCI device is released. Add a pcibios_release_device() call that
architectures can override to do so.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3c6e6ae7 25-May-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

PCI: Introduce pci_alloc_dev(struct pci_bus*) to replace alloc_pci_dev()

Here we introduce a new interface to replace alloc_pci_dev():

struct pci_dev *pci_alloc_dev(struct pci_bus *bus)

It takes a "struct pci_bus *" argument, so we can alloc a PCI device
on a target PCI bus, and it acquires a reference on the pci_bus.
We use pci_alloc_dev(NULL) to simplify the old alloc_pci_dev(),
and keep it for a while but mark it as __deprecated.

Holding a reference to the pci_bus ensures that referencing
pci_dev->bus is valid as long as the pci_dev is valid.

[bhelgaas: keep existing "return error early" structure in pci_alloc_dev()]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# cf4d1cf5 25-May-2013 Kevin Hao <haokexin@gmail.com>

PCI: Unset resource if initial BAR value is invalid

The initial BAR value in the following example is invalid:

pci_bus 0000:00: root bus resource [mem 0xa0000000-0xbfffffff] (bus address [0xe0000000-0xffffffff])
pci 0000:01:00.0: reg 10: initial BAR value: 0xa0000000
pci 0000:01:00.0: reg 10: [mem 0xa0000000-0xa000007f 64bit]

bus_to_resource(0xa0000000) yields 0xa0000000 because there's no host
bridge window whose bus address range contains 0xa0000000. But CPU
accesses to 0xa0000000 appear on the bus at 0xe0000000, so they will
not be claimed if the BAR contains 0xa0000000.

If we find a BAR where resource_to_bus(bus_to_resource(A)) != A, we can
work around this problem by reassigning the BAR.

[bhelgaas: changelog, comment]
Reference: https://lkml.kernel.org/r/1368536876-27307-3-git-send-email-haokexin@gmail.com
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 96ddef25 25-May-2013 Kevin Hao <haokexin@gmail.com>

PCI: Consolidate calls to pcibios_bus_to_resource() in __pci_read_base()

Since we will invoke pcibios_bus_to_resource() unconditionally if we
don't goto fail, move it out of if/else wrap. No function change.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 33963e30 25-May-2013 Kevin Hao <haokexin@gmail.com>

PCI: Add 0x prefix to BAR register position in __pci_read_base()

We print the BAR register's position in hexadecimal format, so it
is more readable if 0x prefix is added.

[bhelgaas: keep dev_printk(), not dev_dbg(), so this is always in dmesg]
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e253aaf0 07-May-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Delay final fixups until resources are assigned

Commit 4f535093cf "PCI: Put pci_dev in device tree as early as possible"
moved final fixups from pci_bus_add_device() to pci_device_add(). But
pci_device_add() happens before resource assignment, so BARs may not be
valid yet.

Typical flow for hot-add:

pciehp_configure_device
pci_scan_slot
pci_scan_single_device
pci_device_add
pci_fixup_device(pci_fixup_final, dev) # previous location
# resource assignment happens here
pci_bus_add_devices
pci_bus_add_device
pci_fixup_device(pci_fixup_final, dev) # new location

[bhelgaas: changelog, move fixups to pci_bus_add_device()]
Reference: https://lkml.kernel.org/r/20130415182614.GB9224@xanatos
Reported-by: David Bulkow <David.Bulkow@stratus.com>
Tested-by: David Bulkow <David.Bulkow@stratus.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org # v3.9+


# 88e7b167 07-Apr-2013 Brian King <brking@linux.vnet.ibm.com>

pci: Set dev->dev.type in alloc_pci_dev

Set dev->dev.type in alloc_pci_dev so that archs that have their own
versions of pci_setup_device get this set properly in order to ensure
things like the boot_vga sysfs parameter get created as expected.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>


# 10a95747 11-Apr-2013 Jiang Liu <liuj97@gmail.com>

PCI: Add pcibios hooks for adding and removing PCI buses

On ACPI-based platforms, the pci_slot driver creates PCI slot devices
according to information from ACPI tables by registering an ACPI PCI
subdriver. The ACPI PCI subdriver will only be called when creating/
destroying PCI root buses, and it won't be called when hot-plugging
P2P bridges. It may cause stale PCI slot devices after hot-removing
a P2P bridge if that bridge has associated PCI slots. And the acpiphp
driver has the same issue too.

This patch introduces two hook points into the PCI core, which will
be invoked when creating/destroying PCI buses for PCI host and P2P
bridges. They could be used to setup/destroy platform dependent stuff
in a unified way, both at boot time and for PCI hotplug operations.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Myron Stowe <myron.stowe@redhat.com>


# 981cf9ea 11-Apr-2013 Jiang Liu <liuj97@gmail.com>

PCI: Clean up usages of pci_bus->is_added

Now pci_bus->is_added is only used to guard invoking of
pcibios_fixup_bus() in pci_scan_child_bus(), so just set
it directly after the fixups and remove the other test
and set in pci_bus_add_devices().

[bhelgaas: changelog]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Yinghai Lu <yinghai@kernel.org>


# 4f535093 21-Jan-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Put pci_dev in device tree as early as possible

We want to put pci_dev structs in the device tree as soon as possible so
for_each_pci_dev() iteration will not miss them, but driver attachment
needs to be delayed until after pci_assign_unassigned_resources() to make
sure all devices have resources assigned first.

This patch moves device registering from pci_bus_add_devices() to
pci_device_add(), which happens earlier, leaving driver attachment in
pci_bus_add_devices().

It also removes unattached child bus handling in pci_bus_add_devices().
That's not needed because child bus via pci_add_new_bus() is already
in parent bus children list.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e723f0b4 21-Jan-2013 Jiang Liu <jiang.liu@huawei.com>

PCI: Make device create/destroy logic symmetric

According to device model documentation, the way to create/destroy PCI
devices should be symmetric. The rule is to either use
1) device_register()/device_unregister()
or
2) device_initialize()/device_add()/device_del()/put_device().

So change PCI core logic to follow the rule and get rid of the redundant
pci_dev_get()/pci_dev_put() pair.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7629d19a 21-Jan-2013 Yinghai Lu <yinghai@kernel.org>

PCI: Set pci_dev dev_node early so IOAPIC irq_descs are allocated locally

Otherwise irq_desc for PCI bridge with hot-added IOAPIC may not be
allocated on the local node.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b1bd58e4 25-Jan-2013 Yijing Wang <wangyijing@huawei.com>

PCI: Consolidate "next-function" functions

There are several next_fn functions (no_next_fn, next_trad_fn,
next_ari_fn); consolidate them in next_fn() to simplify the code.

[bhelgaas: make next_fn() static, rework control flow]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 31ab2476 14-Jan-2013 Yijing Wang <wangyijing@huawei.com>

PCI: Rename pci_enable_ari() to pci_configure_ari()

pci_enable_ari() now supports enabling or disabling ARI forwarding. So
rename pci_enable_ari() to pci_configure_ari() for easy understanding.

No functional change.

[bhelgaas: changelog]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6c0cc950 09-Jan-2013 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / PCI: Set root bridge ACPI handle in advance

The ACPI handles of PCI root bridges need to be known to
acpi_bind_one(), so that it can create the appropriate
"firmware_node" and "physical_node" files for them, but currently
the way it gets to know those handles is not exactly straightforward
(to put it lightly).

This is how it works, roughly:

1. acpi_bus_scan() finds the handle of a PCI root bridge,
creates a struct acpi_device object for it and passes that
object to acpi_pci_root_add().

2. acpi_pci_root_add() creates a struct acpi_pci_root object,
populates its "device" field with its argument's address
(device->handle is the ACPI handle found in step 1).

3. The struct acpi_pci_root object created in step 2 is passed
to pci_acpi_scan_root() and used to get resources that are
passed to pci_create_root_bus().

4. pci_create_root_bus() creates a struct pci_host_bridge object
and passes its "dev" member to device_register().

5. platform_notify(), which for systems with ACPI is set to
acpi_platform_notify(), is called.

So far, so good. Now it starts to be "interesting".

6. acpi_find_bridge_device() is used to find the ACPI handle of
the given device (which is the PCI root bridge) and executes
acpi_pci_find_root_bridge(), among other things, for the
given device object.

7. acpi_pci_find_root_bridge() uses the name (sic!) of the given
device object to extract the segment and bus numbers of the PCI
root bridge and passes them to acpi_get_pci_rootbridge_handle().

8. acpi_get_pci_rootbridge_handle() browses the list of ACPI PCI
root bridges and finds the one that matches the given segment
and bus numbers. Its handle is then used to initialize the
ACPI handle of the PCI root bridge's device object by
acpi_bind_one(). However, this is *exactly* the ACPI handle we
started with in step 1.

Needless to say, this is quite embarassing, but it may be avoided
thanks to commit f3fd0c8 (ACPI: Allow ACPI handles of devices to be
initialized in advance), which makes it possible to initialize the
ACPI handle of a device before passing it to device_register().

Accordingly, add a new __weak routine, pcibios_root_bridge_prepare(),
defaulting to an empty implementation that can be replaced by the
interested architecutres (x86 and ia64 at the moment) with functions
that will set the root bridge's ACPI handle before its dev member is
passed to device_register(). Make both x86 and ia64 provide such
implementations of pcibios_root_bridge_prepare() and remove
acpi_pci_find_root_bridge() and acpi_get_pci_rootbridge_handle() that
aren't necessary any more.

Included is a fix for breakage on systems with non-ACPI PCI host
bridges from Bjorn Helgaas.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d2e5f0c1 22-Dec-2012 Rafael J. Wysocki <rafael.j.wysocki@intel.com>

ACPI / PCI: Rework the setup and cleanup of device wakeup

Currently, the ACPI wakeup capability of PCI devices is set up
in two different places, partially in acpi_pci_bind() where
runtime wakeup is initialized and partially in
platform_pci_wakeup_init(), where system wakeup is initialized.
The cleanup is only done in acpi_pci_unbind() and it only covers
runtime wakeup.

Use the new .setup() and .cleanup() callbacks in struct acpi_bus_type
to consolidate that code and do the setup and the cleanup each in one
place.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Toshi Kani <toshi.kani@hp.com>


# 231afea1 05-Dec-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: Use standard PCIe Capability Link register field names

Use the standard #defines for PCIe Link Status and Capability registers
rather than bare numbers.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7793eeab 05-Dec-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: Add and use standard PCI-X Capability register names

Add and use #defines for PCI-X Capability registers and fields.
Note that the PCI-X Capability has a different layout for
type 0 (endpoint) and type 1 (bridge) devices.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 15856ad5 21-Nov-2012 Bill Pemberton <wfp5p@virginia.edu>

PCI: Remove __dev* markings

CONFIG_HOTPLUG is going away as an option so __devexit_p, __devint,
__devinitdata, __devinitconst, and _devexit are no longer needed.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b40b97ae 21-Nov-2012 Bill Pemberton <wfp5p@virginia.edu>

PCI: Remove CONFIG_HOTPLUG ifdefs

Remove conditional code based on CONFIG_HOTPLUG being false. It's
always on now in preparation of it going away as an option.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 4e15c46b 05-Nov-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Add pci_device_type to pdev's device struct

Need type filled in device structure so it can be used for visible
attribute control in sysfs for pci_dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e164f658 30-Oct-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Move out pci_enable_bridges out of assign_unsigned_bus_res

So could use assign_unassigned_bus_res pci root bus add

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a5213a31 30-Oct-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Move pci_rescan_bus() back to probe.c

We have pci_assign_unassigned_bus_resources() in as global function now.

Move pci_rescan_bus() back to probe.c where it should be.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 1965f66e 10-Sep-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Check P2P bridge for invalid secondary/subordinate range

For bridges with "secondary > subordinate", i.e., invalid bus number
apertures, we don't enumerate anything behind the bridge unless the
user specified "pci=assign-busses".

This patch makes us automatically try to reassign the downstream bus
numbers in this case (just for that bridge, not for all bridges as
"pci=assign-busses" does).

We don't discover all the devices on the Intel DP43BF motherboard
without this change (or "pci=assign-busses") because its BIOS configures
a bridge as:

pci 0000:00:1e.0: PCI bridge to [bus 20-08] (subtractive decode)

[bhelgaas: changelog, change message to dev_info]
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=18412
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=625754
Reported-by: Brian C. Huffman <bhuffman@graze.net>
Reported-by: VL <vl.homutov@gmail.com>
Tested-by: VL <vl.homutov@gmail.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org


# f8ba65e8 24-Aug-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: Remove bus number resource debug messages

These messages don't seem to add much value.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 0ff9514b 23-Aug-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: Don't print anything while decoding is disabled

If we try to print to the console device while its decoding is disabled,
the system will hang.

Reported-and-tested-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Olof Johansson <olof@lixom.net>


# 59875ae4 24-Jul-2012 Jiang Liu <jiang.liu@huawei.com>

PCI/core: Use PCI Express Capability accessors

Use PCI Express Capability access functions to simplify core.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b2ef39be 24-Jul-2012 Yijing Wang <wangyijing@huawei.com>

PCI: Remove unused field pcie_type from struct pci_dev

With introduction of pci_pcie_type(), pci_dev->pcie_type field becomes
redundant, so remove it.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 62f87c0e 24-Jul-2012 Yijing Wang <wangyijing@huawei.com>

PCI: Introduce pci_pcie_type(dev) to replace pci_dev->pcie_type

Introduce an inline function pci_pcie_type(dev) to extract PCIe
device type from pci_dev->pcie_flags_reg field, and prepare for
removing pci_dev->pcie_type.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 786e2288 24-Jul-2012 Yijing Wang <wangyijing@huawei.com>

PCI: Add pcie_flags_reg to cache PCIe capabilities register

Since PCI Express Capabilities Register is read only, cache its value
into struct pci_dev to avoid repeatedly calling pci_read_config_*().

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 2b28ae19 09-Jul-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: reimplement P2P bridge 1K I/O windows (Intel P64H2)

9d265124d051 and 15a260d53f7c added quirks for P2P bridges that support
I/O windows that start/end at 1K boundaries, not just the 4K boundaries
defined by the PCI spec. For details, see the IOBL_ADR register and the
EN1K bit in the CNF register in the Intel 82870P2 (P64H2).

These quirks complicate the code that reads P2P bridge windows
(pci_read_bridge_io() and pci_cfg_fake_ranges()) because the bridge
I/O resource is updated in the HEADER quirk, in pci_read_bridge_io(),
in pci_setup_bridge(), and again in the FINAL quirk. This is confusing
and makes it impossible to reassign the bridge windows after FINAL
quirks are run.

This patch adds support for 1K windows in the generic paths, so the
HEADER quirk only has to enable this support. The FINAL quirk, which
used to undo damage done by pci_setup_bridge(), is no longer needed.

This removes "if (!res->start) res->start = ..." from pci_read_bridge_io();
that was part of 9d265124d051 to avoid overwriting the resource filled in
by the quirk. Since pci_read_bridge_io() itself now knows about
granularity, the quirk no longer updates the resource and this test is no
longer needed.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# bbffe435 06-Jul-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: leave MEM and IO decoding disabled during 64-bit BAR sizing, too

After 253d2e5498, we disable MEM and IO decoding for most devices while we
size 32-bit BARs. However, we restore the original COMMAND register before
we size the upper 32 bits of 64-bit BARs, so we can still cause a conflict.

This patch waits to restore the original COMMAND register until we're
completely finished sizing the BAR.

Reference: https://lkml.org/lkml/2007/8/25/154
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5dde383e 09-Jul-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: allow P2P bridge windows starting at PCI bus address zero

cd81e1ea1a4c added checks that prevent us from using P2P bridge windows
that start at PCI bus address zero. The reason was to "prevent us from
overwriting resources that are unassigned."

But generic code should allow address zero in both BARs and bridge
windows, so I think that commit was a mistake.

Windows at bus address zero are legal and likely to exist on machines with
an offset between bus addresses and CPU addresses. For example, in the
following hypothetical scenario, the bridge at 00:01.0 has a window at bus
address zero and the device at 01:00.0 has a BAR at bus address zero, and
I think both are perfectly valid:

PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [mem 0x100000000-0x1ffffffff] (bus address [0x00000000-0xffffffff])
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:01.0: bridge window [mem 0x100000000-0x100ffffff]
pci 0000:01:00.0: reg 10: [mem 0x100000000-0x100ffffff]

Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 8f38eaca 19-Jun-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: fix P2P bridge I/O port window sign extension

On P2P bridges with 32-bit I/O decoding, we incorrectly sign-extended
windows starting at 0x80000000 or above. In "base |= (io_base_hi << 16)",
"io_base_hi" is promoted to a signed int before being extended to an
unsigned long.

This would cause a window starting at I/O address 0x80000000 to be
treated as though it started at 0xffffffff80008000 instead, which
should cause "no compatible bridge window" errors when we enumerate
devices using that I/O space.

The mmio and mmio_pref casts are not strictly necessary, but without
them, correctness depends on the types of the PCI_MEMORY_RANGE_MASK and
PCI_PREF_RANGE_MASK constants, which are not obvious from reading the
local code.

Found by Coverity (CID 138747 and CID 138748).

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# bc76b731 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: insert busn_res for child bus

Now we can insert busn_res now, after all root bus's get inserted.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 857c3b66 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: add default busn_res for pci_scan_bus()

also do not need to shrink busn_res.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 67cdc827 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: add default busn_resource

We need to put into the resources list for legacy system.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 4d99f524 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: checking busn_res in pci_scan_root_bus()

Some callers do not supply the bus number aperture, usually because they do
not know the end. In this case, we assume the aperture extends from the
root bus number to bus 255, scan the bus, and shrink the bus number
resource so it ends at the largest bus number we found.

This is obviously not correct because the actual end of the aperture may
well be larger than the largest bus number we found. But I guess it's all
we have for now.

Also print out one info about that, so we could find out which path
does not have busn_res in resources list.

[bhelgaas: changelog, _safe iterator unnecessary, use %pR format for bus]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>


# f848ffb1 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: insert busn_res in pci_create_root_bus()

That busn_res is from resources list.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 98a35831 18-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: add busn_res operation functions

Will use them insert/update busn res in pci_bus struct.

[bhelgaas: print conflicting entry if insertion fails]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5cc62c20 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: build a bus number resource tree for every domain

This adds get_pci_domain_busn_res(), which returns the root of the
bus number resource tree for a domain, creating it if necessary.
We will later populate the tree with the bus numbers used by host
bridges and P2P bridges in the domain.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b918c62e 17-May-2012 Yinghai Lu <yinghai@kernel.org>

PCI: replace struct pci_bus secondary/subordinate with busn_res

Replace the struct pci_bus secondary/subordinate members with the
struct resource busn_res. Later we'll build a resource tree of these
bus numbers.

[bhelgaas: changelog]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 284f5f9d 30-Apr-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: work around Stratus ftServer broken PCIe hierarchy

A PCIe downstream port is a P2P bridge. Its secondary interface is
a link that should lead only to device 0 (unless ARI is enabled)[1], so
we don't probe for non-zero device numbers.

Some Stratus ftServer systems have a PCIe downstream port (02:00.0) that
leads to both an upstream port (03:00.0) and a downstream port (03:01.0),
and 03:01.0 has important devices below it:

[0000:02]-+-00.0-[03-3c]--+-00.0-[04-09]--...
\-01.0-[0a-0d]--+-[USB]
+-[NIC]
+-...

Previously, we didn't enumerate device 03:01.0, so USB and the network
didn't work. This patch adds a DMI quirk to scan all device numbers,
not just 0, below a downstream port.

Based on a patch by Prarit Bhargava.

[1] PCIe spec r3.0, sec 7.3.1

CC: Myron Stowe <mstowe@redhat.com>
CC: Don Dutile <ddutile@redhat.com>
CC: James Paradis <james.paradis@stratus.com>
CC: Matthew Wilcox <matthew.r.wilcox@intel.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 4fa2649a 02-Apr-2012 Yinghai Lu <yinghai@kernel.org>

PCI: add host bridge release support

We need a hook to release host bridge resources allocated when creating
root bus.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7b543663 02-Apr-2012 Yinghai Lu <yinghai@kernel.org>

PCI: add generic device into pci_host_bridge struct

Use that device for pci_root_bus bridge pointer.

Use pci_release_bus_bridge_dev() to release allocated pci_host_bridge in
remove path.

Use root bus bridge pointer to get host bridge pointer instead of searching
host bridge list. That leaves the host bridge list unused, so remove it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 610929e1 02-Apr-2012 Yinghai Lu <yinghai@kernel.org>

PCI: move host bridge-related code to host-bridge.c

Move host bridge-related code from probe.c to a new host-bridge.c.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# cf48fb6a 16-Mar-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: fix bridge I/O window bus-to-resource conversion

In 5bfa14ed9f3c, I forgot to initialize res2.flags before calling
pcibios_bus_to_resource(), which depends on the resource type to locate the
correct aperture. This bug won't hurt x86, which currently never has an
offset between bus and CPU addresses, but will affect other architectures.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 2069ecfb 15-Feb-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Move "pci reassigndev resource alignment" out of quirks.c

This isn't really a quirk; calling it directly from pci_add_device makes
more sense.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# fb127cb9 23-Feb-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: collapse pcibios_resource_to_bus

Everybody uses the generic pcibios_resource_to_bus() supplied by the core
now, so remove the ARCH_HAS_GENERIC_PCI_OFFSETS used during conversion.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 36a66cd6 23-Feb-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: add generic pcibios_resource_to_bus()

This replaces the generic versions of pcibios_resource_to_bus() and
pcibios_bus_to_resource() in asm-generic/pci.h with versions that use
pci_resource_to_bus() and pci_bus_to_resource().

The replacements are equivalent except that they can apply host
bridge window offsets when the arch has supplied them by using
pci_add_resource_offset().

Each arch can convert to using pci_add_resource_offset() individually by
removing its device resource fixups from pcibios_fixup_bus() and supplying
ARCH_HAS_GENERIC_PCI_OFFSETS. ARCH_HAS_GENERIC_PCI_OFFSETS can be removed
after all have converted.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5bfa14ed 23-Feb-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: convert bus addresses to resource when reading BARs

Some PCI host bridges translate CPU addresses to PCI bus addresses.
Previously, we initialized pci_dev resources with PCI bus addresses,
then converted them to CPU addresses later in arch-specific code
(pcibios_fixup_resources()), which leaves a window of time where the
pci_dev resources are incorrect.

This patch adds support in the core for this address translation.
When the arch creates the root bus, it can supply the host bridge
address translation information, and the core can use it to set the
pci_dev resources correctly from the beginning.

This gives us a way to fix the problem that quirks that run between device
discovery and pcibios_fixup_resources() fail because they use pci_dev
resources that haven't been converted. The reference below is to one
such problem that affected ARM and ia64.

Note that this patch has no effect until an arch starts using
pci_add_resource_offset() with a non-zero offset: before that, all
all host bridge windows have a zero offset and pci_bus_to_resource()
copies the pci_bus_region directly to the struct resource.

Reference: https://lkml.org/lkml/2009/10/12/405
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 0efd5aab 23-Feb-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: add struct pci_host_bridge_window with CPU/bus address offset

Some PCI host bridges apply an address offset, so bus addresses on PCI are
different from CPU addresses. This patch adds a way for architectures to
tell the PCI core about this offset. For example:

LIST_HEAD(resources);
pci_add_resource_offset(&resources, host->io_space, host->io_offset);
pci_add_resource_offset(&resources, host->mem_space, host->mem_offset);
pci_scan_root_bus(parent, bus, ops, sysdata, &resources);

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5a21d70d 23-Feb-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: add struct pci_host_bridge and a list of all bridges found

This adds a list of all PCI host bridges we find and a way to look up
the host bridge from a pci_dev.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a5390aa6 23-Feb-2012 Bjorn Helgaas <bhelgaas@google.com>

PCI: don't publish new root bus until it's fully initialized

When pci_create_root_bus() adds the new struct pci_bus to the global
pci_root_buses list, the bus becomes visible to other parts of the
kernel, so it should be fully initialized.

This patch delays adding the bus to the pci_root_buses list until after
all the struct pci_bus initialization is finished.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# f796841e 11-Feb-2012 Yinghai Lu <yinghai@kernel.org>

PCI: fix memleak for pci dev removing during hotplug

unreferenced object 0xffff880276d17700 (size 64):
comm "swapper/0", pid 1, jiffies 4294897182 (age 3976.028s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 18 f9 de 76 02 88 ff ff ...........v....
10 00 00 00 0e 00 00 00 0f 28 40 00 00 00 00 00 .........(@.....
backtrace:
[<ffffffff81c8aede>] kmemleak_alloc+0x26/0x43
[<ffffffff811385f0>] __kmalloc+0x121/0x183
[<ffffffff813cf821>] pci_add_cap_save_buffer+0x35/0x7c
[<ffffffff813d12b7>] pci_allocate_cap_save_buffers+0x1d/0x65
[<ffffffff813cdb52>] pci_device_add+0x92/0xf1
[<ffffffff81c8afe6>] pci_scan_single_device+0x9f/0xa1
[<ffffffff813cdbd2>] pci_scan_slot.part.20+0x21/0x106
[<ffffffff813cdce2>] pci_scan_slot+0x2b/0x35
[<ffffffff81c8dae4>] __pci_scan_child_bus+0x51/0x107
[<ffffffff81c8d75b>] pci_scan_bridge+0x376/0x6ae
[<ffffffff81c8db60>] __pci_scan_child_bus+0xcd/0x107
[<ffffffff81c8dbab>] pci_scan_child_bus+0x11/0x2a
[<ffffffff81cca58c>] pci_acpi_scan_root+0x18b/0x21c
[<ffffffff81c916be>] acpi_pci_root_add+0x1e1/0x42a
[<ffffffff81406210>] acpi_device_probe+0x50/0x190
[<ffffffff814a0227>] really_probe+0x99/0x126

Need to free saved_buffer for capabilities.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 2dd8ba92 19-Feb-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Fix device class print out

Found debug print of class is shifted.

| pci 0000:f8:15.2: [8086:2b56] type 0 class 0x000600

Code is trying to print class with 6 digits, but use shifted class with
4 digits valid value as variable.

Change to original dev->class directly.

Also remove not needed calculating of local variable class, because it
will be updated after pci_fixup_device(pci_fixup_early...)

Also unify type print out when class and header is not matched.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# efdc87da 27-Jan-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Separate pci_bus_read_dev_vendor_id from pci_scan_device

We can reuse it for pciehp probing.

-v2: according to Kenji, fix crs timeout checking, and export the function
for later use when pciehp is compiled as a module.

Suggested-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 9b03088f 21-Jan-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Make pci_rescan_bus handle add_list

This allows us to allocate resources to hotplug bridges during
remove/rescan.

We need to move the function to setup-bus.c so it can use
__pci_bus_size_bridges and __pci_bus_assign_resources directly to take
the add_list resource tracking list.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 2f320521 21-Jan-2012 Yinghai Lu <yinghai@kernel.org>

PCI: Make rescan bus increase bridge resource size if needed

Current rescan will not touch bridge MMIO and IO.

Try to reuse pci_assign_unassigned_bridge_resources(bridge) to update bridge
resources, if child devices need more resources.

Only do that for bridges whose children are all removed already; i.e. don't
release resources that could already be in use by drivers on child devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 71f6bd4a 29-Jan-2012 Yinghai Lu <yinghai.lu@oracle.com>

PCI: workaround hard-wired bus number V2

Fixes PCI device detection on IBM xSeries IBM 3850 M2 / x3950 M2
when using ACPI resources (_CRS).
This is default, a manual workaround (without this patch)
would be pci=nocrs boot param.

V2: Add dev_warn if the workaround is hit. This should reveal
how common such setups are (via google) and point to possible
problems if things are still not working as expected.
-> Suggested by Jan Beulich.

Cc: stable@vger.kernel.org
Tested-by: garyhade@us.ibm.com
Signed-off-by: Yinghai Lu <yinghai.lu@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 118faafa 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: remove pci_create_bus()

All users of pci_create_bus() have been converted to pci_create_root_bus(),
so remove it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 7e00fe2e 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: deprecate pci_scan_bus_parented()

Users of pci_scan_bus_parented() should be converted to use either
pci_scan_root_bus() (preferred, but also calls pci_bus_add_devices)
or
pci_create_root_bus()
pci_scan_child_bus()

Since pci_scan_bus_parented(), I'm marking it deprecated now and will
actually remove it later.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 1e39ae9f 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: convert pci_scan_bus_parented() to use pci_create_root_bus()

This converts pci_scan_bus_parented() to use pci_create_root_bus()
instead of pci_create_bus(). The new bus still has the default (incorrect)
resources, so this patch doesn't help fix that problem, but it does remove
one more use of pci_create_bus().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# de4b2f76 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: convert pci_scan_bus() to use pci_create_root_bus()

I plan to deprecate pci_scan_bus_parented(), so use pci_create_root_bus()
directly instead. pci_scan_bus() itself will be removed as soon as all
callers are gone, so this is just an interim step.

v2: export pci_scan_bus

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a2ebb827 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: add pci_scan_root_bus() that accepts resource list

"Early" and "header" quirks often use incorrect bus resources because they
see the default resources assigned by pci_create_bus(), before the
architecture fixes them up (typically in pcibios_fixup_bus()). Regions
reserved by these quirks end up with the wrong parents.

Here's the standard path for scanning a PCI root bus:

pci_scan_bus or pci_scan_bus_parented
pci_create_bus <-- A create with default resources
pci_scan_child_bus
pci_scan_slot
pci_scan_single_device
pci_scan_device
pci_setup_device
pci_fixup_device(early) <-- B
pci_device_add
pci_fixup_device(header) <-- C
pcibios_fixup_bus <-- D fill in correct resources

Early and header quirks at B and C use the default (incorrect) root bus
resources rather than those filled in at D.

This patch adds a new pci_scan_root_bus() function that sets the bus
resources correctly from a supplied list of resources.

I intend to remove pci_scan_bus() and pci_scan_bus_parented() after
fixing all callers.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 166c6370 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: add pci_create_root_bus() that accepts resource list

pci_create_bus() assigns ioport_resource and iomem_resource as the default
bus resources, i.e., the entire address space. Architectures fix these
later, typically in pcibios_fixup_bus() or after pci_scan_bus_parented()
returns, but code that runs in the interim sees incorrect resource
information.

This patch adds a new pci_create_root_bus() that sets the bus resources
correctly from a supplied list of resources.

I intend to remove pci_create_bus() after changing all callers.

Based on original patch by Deng-Cheng Zhu.

Reference: http://www.spinics.net/lists/mips/msg41654.html
Reference: https://lkml.org/lkml/2011/8/26/88
Signed-off-by: Deng-Cheng Zhu <dczhu@mips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a9d9f527 28-Oct-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: show host bridges and root bus resources

Show the bus number and resources for every root bus we create. This
will become more interesting when we supply the correct resources
instead of using the defaults (ioport_resource and iomem_resource).

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 68e35c9b 17-Nov-2011 Zac Storer <zac.3.14159@gmail.com>

PCI: fix a brace coding style issue in probe.c

Fixed a brace coding style issue.

Signed-off-by: Zac Storer <zac.3.14159@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a513a99a7 14-Oct-2011 Jon Mason <mason@myri.com>

PCI: Clean-up MPS debug output

Clean-up MPS debug output to make it a single line and aligned, thus
making it more readable for a large number of buses and devices in a
single system.

Suggested by Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 62f392ea 14-Oct-2011 Jon Mason <mason@myri.com>

PCI: enable MPS "performance" setting to properly handle bridge MPS

Rework the "performance" MPS option to configure the device MPS with the
smaller of the device MPSS or the bridge MPS (which is assumed to be
properly configured at this point to the largest allowable MPS based on
its parent bus).

Also, rework the MRRS setting to report an inability to set the MRRS to
a valid setting.

Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 5f39e670 03-Oct-2011 Jon Mason <mason@myri.com>

PCI: Disable MPS configuration by default

Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.

Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).

Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 1a4b1a41 13-Sep-2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>

pci: Don't crash when reading mpss from root complex

In pcie_find_smpss(), we have the following statement:

if (dev->is_hotplug_bridge && (!list_is_singular(&dev->bus->devices) ||
dev->bus->self->pcie_type != PCI_EXP_TYPE_ROOT_PORT))

The problem is that at least on my machine, this gets called for the
root complex (virtual P2P bridge), and dev->bus->self is NULL since
the parent bus for this is not itself anchor to a PCI device.

This adds the necessary NULL check.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jon Mason <mason@myri.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ed2888e9 08-Sep-2011 Jon Mason <mason@myri.com>

PCI: Remove MRRS modification from MPS setting code

Modifying the Maximum Read Request Size to 0 (value of 128Bytes) has
massive negative ramifications on some devices. Without knowing which
devices have this issue, do not modify from the default value when
walking the PCI-E bus in pcie_bus_safe mode. Also, make pcie_bus_safe
the default procedure.

Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Simon Kirby <sim@hostway.ca>
Tested-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-and-tested-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
References: https://bugzilla.kernel.org/show_bug.cgi?id=42162
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 5307f6d5 08-Sep-2011 Shyam Iyer <shyam.iyer.t@gmail.com>

Fix pointer dereference before call to pcie_bus_configure_settings

Commit b03e7495a862 ("PCI: Set PCI-E Max Payload Size on fabric")
introduced a potential NULL pointer dereference in calls to
pcie_bus_configure_settings due to attempts to access pci_bus self
variables when the self pointer is NULL.

To correct this, verify that the self pointer in pci_bus is non-NULL
before dereferencing it.

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: Jon Mason <mason@myri.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# debc3b77 01-Aug-2011 Jon Mason <mason@myri.com>

PCI: export pcie_bus_configure_settings symbol

pcie_bus_configure_settings needs to be exported if the PCI hotplug
driver is being compiled as a module.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# b03e7495 20-Jul-2011 Jon Mason <mason@myri.com>

PCI: Set PCI-E Max Payload Size on fabric

On a given PCI-E fabric, each device, bridge, and root port can have a
different PCI-E maximum payload size. There is a sizable performance
boost for having the largest possible maximum payload size on each PCI-E
device. However, if improperly configured, fatal bus errors can occur.
Thus, it is important to ensure that PCI-E payloads sends by a device
are never larger than the MPS setting of all devices on the way to the
destination.

This can be achieved two ways:

- A conservative approach is to use the smallest common denominator of
the entire tree below a root complex for every device on that fabric.

This means for example that having a 128 bytes MPS USB controller on one
leg of a switch will dramatically reduce performances of a video card or
10GE adapter on another leg of that same switch.

It also means that any hierarchy supporting hotplug slots (including
expresscard or thunderbolt I suppose, dbl check that) will have to be
entirely clamped to 128 bytes since we cannot predict what will be
plugged into those slots, and we cannot change the MPS on a "live"
system.

- A more optimal way is possible, if it falls within a couple of
constraints:
* The top-level host bridge will never generate packets larger than the
smallest TLP (or if it can be controlled independently from its MPS at
least)
* The device will never generate packets larger than MPS (which can be
configured via MRRS)
* No support of direct PCI-E <-> PCI-E transfers between devices without
some additional code to specifically deal with that case

Then we can use an approach that basically ignores downstream requests
and focuses exclusively on upstream requests. In that case, all we need
to care about is that a device MPS is no larger than its parent MPS,
which allows us to keep all switches/bridges to the max MPS supported by
their parent and eventually the PHB.

In this case, your USB controller would no longer "starve" your 10GE
Ethernet and your hotplug slots won't affect your global MPS.
Additionally, the hotplugged devices themselves can be configured to a
larger MPS up to the value configured in the hotplug bridge.

To choose between the two available options, two PCI kernel boot args
have been added to the PCI calls. "pcie_bus_safe" will provide the
former behavior, while "pcie_bus_perf" will perform the latter behavior.
By default, the latter behavior is used.

NOTE: due to the location of the enablement, each arch will need to add
calls to this function. This patch only enables x86.

This patch includes a number of changes recommended by Benjamin
Herrenschmidt.

Tested-by: Jordan_Hargrave@dell.com
Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 7b87c9df 14-Jun-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: remove printks about disabled bridge windows

I don't think there's enough value in the fact of a bridge window
being disabled to justify cluttering the dmesg log with it.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 28c6821a 14-Jun-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: fold pci_calc_resource_flags() into decode_bar()

decode_bar() and pci_calc_resource_flags() both looked at the PCI BAR
type information, and it's simpler to just do it all in one place.

decode_bar() sets IORESOURCE_IO, IORESOURCE_MEM, and IORESOURCE_MEM_64
as appropriate, so res->flags contains all the information pci_bar_type
does, so we don't need to test the pci_bar_type return value.

decode_bar() used to return pci_bar_type, which we no longer need. We
can simplify it a bit by returning the struct resource flags rather than
updating them internally.

In pci_update_resource(), there's no need to decode the BAR type bits
again; we can just test for IORESOURCE_MEM_64 directly.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 8d6a6a47 14-Jun-2011 Bjorn Helgaas <bhelgaas@google.com>

PCI: treat mem BAR type "11" (reserved) as 32-bit, not 64-bit, BAR

This fixes a minor regression where broken PCI devices that use the
reserved "11" memory BAR type worked before e354597cce but not after.

The low four bits of a memory BAR are "PTT0" where P=1 for prefetchable
BARs, and TT is as follows:

00 32-bit BAR, anywhere in lower 4GB
01 anywhere below 1MB (reserved as of PCI 2.2)
10 64-bit BAR
11 reserved

Prior to e354597cce, we treated "0100" as a 64-bit BAR and all others,
including prefetchable 64-bit BARs ("1100") as 32-bit BARs. The e354597cce
fix, which appeared in 2.6.28, treats "x1x0" as 64-bit BARs, so the
reserved "x110" types are treated as 64-bit instead of 32-bit.

This patch returns to treating the reserved "11" type as a 32-bit BAR and
adds a warning if we see it.

It also logs a note if we see a 1M BAR. This is not a warning, because
such hardware conforms to pre-PCI 2.2 spec, but I think it's worth noting
because Linux ignores the 1M restriction if it ever has to assign the BAR.

CC: Peter Chubb <peterc@gelato.unsw.edu.au>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35952
Reported-by: Jan Zwiegers <jan@radicalsystems.co.za>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# b1a98b69 01-Jun-2011 Tiejun Chen <tiejun.chen@windriver.com>

PCI: enumerate the PCI device only removed out PCI hieratchy of OS when re-scanning PCI

When hot-plugging a root bridge, we always prevent assigning a bus number
that already exists. This makes sure we don't step over an existing bus.
But sometimes we only remove PCI device in PCI hieratchy of OS, i,e.

echo 1 > /sys/bus/pci/devices/.../remove

but actually don't hotplug this device out the platform, so in this case
we still should re-scan this bus to enumerate this device when re-scanning
PCI again.

Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 98d9f30c8 10-Apr-2011 Benjamin Herrenschmidt <benh@kernel.crashing.org>

pci/of: Match PCI devices to OF nodes dynamically

powerpc has two different ways of matching PCI devices to their
corresponding OF node (if any) for historical reasons. The ppc64 one
does a scan looking for matching bus/dev/fn, while the ppc32 one does a
scan looking only for matching dev/fn on each level in order to be
agnostic to busses being renumbered (which Linux does on some
platforms).

This removes both and instead moves the matching code to the PCI core
itself. It's the most logical place to do it: when a pci_dev is created,
we know the parent and thus can do a single level scan for the matching
device_node (if any).

The benefit is that all archs now get the matching for free. There's one
hook the arch might want to provide to match a PHB bus to its device
node. A default weak implementation is provided that looks for the
parent device device node, but it's not entirely reliable on powerpc for
various reasons so powerpc provides its own.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michal Simek <monstr@monstr.eu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 5aceca9d 23-May-2011 David S. Miller <davem@davemloft.net>

PCI: Fix warning in drivers/pci/probe.c on sparc64

IO_SPACE_LIMIT is currently used in two ways:

1) As a way to mask I/O port values read out of PCI base address
registers. This value should be 64-bit.

2) As a value which is the upper limit for all I/O "ports" in the
system.

On sparc64 we store the full 64-bit physical I/O address in the
resources. For this reason we define IO_SPACE_LIMIT at a 64-bit
"all 1's".

This is the right value to use for ioport_resource.end and for the
check made in drivers/pcmcia/rsrc_nonstatic.c:adjust_io().

But in driver/pci/probe.c:__pci_read_base() we mask this against
a "u32" variable and thus get the following warning:

drivers/pci/probe.c: In function ¡__pci_read_base¢:
drivers/pci/probe.c:207: warning: large integer implicitly truncated to unsigned type

Fix this by using an explicit "u32" cast.

I considered changing sparc64 to define a 32-bit "all 1's" like
most other systems do, but this wouldn't work because the checks
in PCMCIA's rsrc_nonstatic.c would no longer be right since they
are testing against fully formed 64-bit resources. As described
above, on sparc64 such resources will hold full 64-bit physical
I/O addresses, not bus-centric 32-bit ones.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# dc2c2c9d 12-May-2011 Yinghai Lu <yinghai@kernel.org>

PCI/sysfs: move bus cpuaffinity to class dev_attrs

Requested by Greg KH to fix a race condition in the creating of PCI bus
cpuaffinity files.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# b9d320fc 12-May-2011 Yinghai Lu <yinghai@kernel.org>

PCI: add rescan to /sys/.../pci_bus/.../

After remove the device from /sys, we have to rescan all or
find out the bridge and access /sys../device/rescan there.

this patch add /sys/.../pci_bus/.../rescan. So user can rescan more easy.
that is more clean and easy to understand.

like after remove 0000:c4:00.0, you can rescan 0000:c4 directly.

-v2: According to Jesse, use function instead of exposing attr, so could hide
#ifdef in header file.
also add code to remove rescan file in remove path.
-v3: GregKH pointed out that we should use dev_attrs to avoid racing.
So add pcibus_attrs and make it to be member of pcibus_attrs.
-v4: Change name to pcibus_dev_attrs according to GregKH

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 7c867c88 24-Jan-2011 Jesper Juhl <jj@chaosbits.net>

PCI: Avoid potential NULL pointer dereference in pci_scan_bridge

pci_add_new_bus() calls pci_alloc_child_bus() which calls pci_alloc_bus()
that allocates memory dynamically with kzalloc(). The return value of
kzalloc() is the pointer that's eventually returned from
pci_add_new_bus(), so since kzalloc() can fail and return NULL so can
pci_add_new_bus(). Thus we may end up dereferencing a NULL pointer in
drivers/pci/probe.c::pci_scan_bridge(). Seems to me we should test for
this and bail out if it happens rather than crashing.
Also removed some trailing whitespace that bugged me while looking at
this.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 2c6413ae 29-Sep-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: log vendor/device ID always

Previously we had to have CONFIG_PCI_DEBUG=y or CONFIG_DYNAMIC_DEBUG=y
to turn on this printk, but I think the IDs are valuable enough that it's
worth putting them in the log always.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 253d2e54 16-Jul-2010 Jacob Pan <jacob.jun.pan@linux.intel.com>

PCI: disable mmio during bar sizing

It is a known issue that mmio decoding shall be disabled while doing PCI
bar sizing. Host bridge and other devices (PCI PIC) shall be excluded for
certain platforms. This patch mainly comes from Mathew Willcox's
patch in http://kerneltrap.org/mailarchive/linux-kernel/2007/9/13/258969.

A new flag bit "mmio_alway_on" is added to pci_dev with the intention that
devices with their mmio decoding cannot be disabled during BAR sizing shall
have this bit set, preferrablly in their quirks.

Without this patch, Intel Moorestown platform graphics unit will be
corrupted during bar sizing activities.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# affb72c3 18-May-2010 Huang Ying <ying.huang@intel.com>

ACPI, APEI, PCIE AER, use general HEST table parsing in AER firmware_first setup

Now, a dedicated HEST tabling parsing code is used for PCIE AER
firmware_first setup. It is rebased on general HEST tabling parsing
code of APEI. The firmware_first setup code is moved from PCI core to
AER driver too, because it is only AER related.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Len Brown <len.brown@intel.com>


# 45aa23b4 22-Apr-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: revert broken device warning

This reverts c519a5a7dab2d. That change added a warning about devices that
didn't respond correctly when sizing BARs, which helped diagnose broken
devices. But the test wasn't specific enough, so it also complained about
working devices with zero-size BARs, e.g.,
https://bugzilla.kernel.org/show_bug.cgi?id=15822

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# c519a5a7 19-Mar-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: complain about devices that seem to be broken

If we can tell that a device isn't working correctly, we should tell
the user to make debugging easier. Otherwise, it can take a lot of
work to determine whether the problem is in the driver, PCMCIA, PCI,
hardware, etc., as in http://bugzilla.kernel.org/show_bug.cgi?id=12006

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 7b8ff6da 16-Mar-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: make disabled window printk style match the enabled ones

No functional change; this just tweaks the changes from 349e1823a405
so the new printks for disabled PCI-to-PCI bridge windows match the
ones for the enabled windows.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 99ddd552 16-Mar-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: break out primary/secondary/subordinate for readability

No functional change; just add names for the primary/secondary/subordinate
bus numbers read from config space rather than repeatedly masking/shifting.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a1e4d72c 08-Feb-2010 Rafael J. Wysocki <rjw@rjwysocki.net>

PM: Allow PCI devices to suspend/resume asynchronously

Set power.async_suspend for all PCI devices and PCIe port services,
so that they can be suspended and resumed in parallel with other
devices they don't depend on in a known way (i.e. devices which are
not their parents or children).

This only affects the "regular" suspend and resume stages, which
means in particular that the restoration of the PCI devices' standard
configuration registers during resume will still be carried out
synchronously (at the "early" resume stage).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>


# 2fe2abf8 23-Feb-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: augment bus resource table with a list

Previously we used a table of size PCI_BUS_NUM_RESOURCES (16) for resources
forwarded to a bus by its upstream bridge. We've increased this size
several times when the table overflowed.

But there's no good limit on the number of resources because host bridges
and subtractive decode bridges can forward any number of ranges to their
secondary buses.

This patch reduces the table to only PCI_BRIDGE_RESOURCE_NUM (4) entries,
which corresponds to the number of windows a PCI-to-PCI (3) or CardBus (4)
bridge can positively decode. Any additional resources, e.g., PCI host
bridge windows or subtractively-decoded regions, are kept in a list.

I'd prefer a single list rather than this split table/list approach, but
that requires simultaneous changes to every architecture. This approach
only requires immediate changes where we set up (a) host bridges with more
than four windows and (b) subtractive-decode P2P bridges, and we can
incrementally change other architectures to use the list.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 2adf7516 23-Feb-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: read bridge windows before filling in subtractive decode resources

No functional change; this fills in the bus subtractive decode resources
after reading the bridge window information rather than before. Also,
print out the subtractive decode resources as we already do for the
positive decode windows.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# fa27b2d1 23-Feb-2010 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: split up pci_read_bridge_bases()

No functional change; this breaks up pci_read_bridge_bases() into separate
pieces for the I/O, memory, and prefetchable memory windows, similar to how
Yinghai recently split up pci_setup_bridge() in 68e84ff3bdc.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# cd81e1ea 22-Jan-2010 Yinghai Lu <yinghai@kernel.org>

PCI: reject mmio ranges starting at 0 on pci_bridge read

We already track unassigned resources in struct resource, and this
prevents us from overwriting resource flags and info in the unassigned
case.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 4fb88c1a 17-Jan-2010 Matthew Wilcox <willy@infradead.org>

PCI: Make pci_scan_slot more robust

Yinghai pointed out that the new pci_scan_slot() crashes when called
on an ARI-capable slot that is empty. Fix this by exiting early from
pci_scan_slot if there is no device in the slot.

Also make next_ari_func() robust against devices not existing in case
the ARI capability is corrupt. ARI also requires that the devices be
listed in order, so if we find a function listed that is out of order,
stop scanning to prevent loops.

Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 9dfd97fe 13-Dec-2009 Matthew Wilcox <willy@infradead.org>

PCI: Add support for reporting PCIe 3.0 speeds

Add the 8.0 GT/s speed.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 45b4cdd5 13-Dec-2009 Matthew Wilcox <willy@infradead.org>

PCI: Add support for AGP in cur/max bus speed

Take advantage of some gaps in the table to fit in support for AGP speeds.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 9be60ca0 13-Dec-2009 Matthew Wilcox <willy@infradead.org>

PCI: Add support for detection of PCIe and PCI-X bus speeds

Both PCIe and PCI-X bridges report their secondary bus speed in their
respective capabilities.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 3749c51a 13-Dec-2009 Matthew Wilcox <willy@infradead.org>

PCI: Make current and maximum bus speeds part of the PCI core

Move the max_bus_speed and cur_bus_speed into the pci_bus. Expose the
values through the PCI slot driver instead of the hotplug slot driver.
Update all the hotplug drivers to use the pci_bus instead of their own
data structures.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# f07852d6 13-Dec-2009 Matthew Wilcox <willy@infradead.org>

PCI: Rewrite pci_scan_slot

The Alternate Routing-ID Interpretation capability allows a single device
to have up to 256 functions. They can be populated sparsely, so the
current technique of scanning every eighth function is not guaranteed
to find them all. By introducing a 'next_fn' function pointer, we can
use the linked list of functions in the ARI capability to scan all the
functions which exist.

We can then speed up the pci_scan_slot by skipping the scan of subsequent
devfns for PCIe devices which are the direct children of Root Ports or
Downstream Ports. These devices are only permitted to implement device
0, unless they are ARI devices, in which case they'll be scanned by the
ARI code above.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# bb209c82 26-Jan-2010 Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/pci: Add calls to set_pcie_port_type() and set_pcie_hotplug_bridge()

We are missing these when building the pci_dev from scratch off
the Open Firmware device-tree

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 5d990b62 04-Dec-2009 Chris Wright <chrisw@sous-sol.org>

PCI: add pci_request_acs

Commit ae21ee65e8bc228416bbcc8a1da01c56a847a60c "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.

Add a function to pci core to allow an IOMMU to request that ACS
be enabled. The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order; iommu has only been detected not initialized.

Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.

Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 06a1cbaf 10-Nov-2009 Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

PCI: use pci_pcie_cap() in pci core

Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 0efea000 04-Nov-2009 Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

PCI: cache PCIe capability offset

There are a lot of codes that searches PCI express capability offset
in the PCI configuration space using pci_find_capability(). Caching it
in the struct pci_dev will reduce unncecessary search. This patch adds
an additional 'pcie_cap' fields into struct pci_dev, which is
initialized at pci device scan time (in set_pcie_port_type()).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 865df576 04-Nov-2009 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: improve discovery/configuration messages

This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.

Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 0207c356 04-Nov-2009 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: replace pr_debug with dev_dbg

Since we have a struct device, we might as well use dev_printk. Note that
both pr_debug() and dev_dbg() are completely compiled out unless DEBUG or
DYNAMIC_DEBUG is defined.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# c7dabef8 27-Oct-2009 Bjorn Helgaas <bjorn.helgaas@hp.com>

vsprintf: use %pR, %pr instead of %pRt, %pRf

Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.

The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf

[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# bc577d2b 06-Oct-2009 Gabe Black <gabe.black@ni.com>

PCI: populate subsystem vendor and device IDs for PCI bridges

Change to populate the subsystem vendor and subsytem device IDs for
PCI-PCI bridges that implement the PCI Subsystem Vendor ID capability.
Previously bridges left subsystem vendor IDs unpopulated.

Signed-off-by: Gabe Black <gabe.black@ni.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 05843961 02-Nov-2009 Matt Domsch <Matt_Domsch@dell.com>

PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode

Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST. PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.

Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.

However, _OSC alone is insufficient for newer BIOSes. Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle. One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.

Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions. The aer driver should honor this, and
not attach itself to devices noted as such.

Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode. Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.

The HEST parsing code may be replaced in the future by a more
feature-rich implementation. This patch provides the minimum needed
to prevent breakage until that implementation is available.

Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 1ed67439 29-Oct-2009 Michael S. Tsirkin <mst@redhat.com>

PCI: fix nit in ROM BAR size probing

When probing for ROM BAR size, we should not change bits 1:10 in this
BAR, because these bits are marked as "reserved for future use" in PCI
spec, so changing them might have side effects.

No such issue for I/O or memory, as there is an implementation note in
PCI spec which explicitly allows writing 0xfffffffff there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# df0e97c6 07-Oct-2009 Allen Kay <allen.m.kay@intel.com>

PCI: add xen dom0 checking before ACS initialization

This patch is predicated on Jeremy's patch in include/xen/xen.h. It'll
prevent ACS init unless the platform has both an IOMMU and we're running
as dom0.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# ae21ee65 07-Oct-2009 Allen Kay <allen.m.kay@intel.com>

PCI: acs p2p upsteram forwarding enabling

Note: dom0 checking in v4 has been separated out into 2/2.

This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:

1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch. This causes the PCI
transaction to go to the matching downstream port instead of go to the
root complex to get translated by VT-d as it should be.

2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.

We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled. This patch is intended to work for both KVM and Xen environments.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a369c791 06-Oct-2009 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: print resources consistently with %pRt

This uses %pRt to print additional resource information (type, size,
prefetchability, etc.) consistently.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 4b77b0a2 09-Sep-2009 Rafael J. Wysocki <rjw@rjwysocki.net>

PCI: Clear saved_state after the state has been restored

Some PCI devices fail if their standard configuration registers are
restored twice in a row. Prevent this from happening by making
pci_restore_state() clear the saved_state flag of the device right
after the device's standard configuration registers have been
populated with the previously saved values.

Simplify PCI PM callbacks by removing the direct clearing of
state_saved from them, as it shouldn't be necessary any more (except
in pci_pm_thaw(), where it has to be cleared, so that the values saved
during the "freeze" phase of hibernation are not used later by mistake).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 28760489 09-Sep-2009 Eric W. Biederman <ebiederm@aristanetworks.com>

PCI: pcie: Ensure hotplug ports have a minimum number of resources

In general a BIOS may goof or we may hotplug in a hotplug controller.
In either case the kernel needs to reserve resources for plugging
in more devices in the future instead of creating a minimal resource
assignment.

We already do this for cardbus bridges I am just adding a variant
for pcie bridges.

v2: Make testing for pcie hotplug bridges based on a flag.

So far we only set the flag for pcie but a header_quirk
could easily be added for the non-standard pci hotplug
bridges.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# d0b8cbed 07-Aug-2009 Yinghai Lu <yinghai@kernel.org>

PCI: print out pref if mmio is prefetchable

We already print it out for pci bridges, so also print it out for pci devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a7db5040 22-Jun-2009 Alex Chiang <achiang@hp.com>

PCI: remove pcibios_scan_all_fns()

This was #define'd as 0 on all platforms, so let's get rid of it.

This change makes pci_scan_slot() slightly easier to read.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Tony Luck <tony.luck@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 9fc39256 26-May-2009 Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

PCI: use pci_is_root_bus() in pci_read_bridge_bases()

Use pci_is_root_bus() in pci_read_bridge_bases() to check if the pci
bus is root, for code consistency.

Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 1f82de10 23-Apr-2009 Yinghai Lu <yinghai@kernel.org>

PCI/x86: don't assume prefetchable ranges are 64bit

We should not assign 64bit ranges to PCI devices that only take 32bit
prefetchable addresses.

Try to set IORESOURCE_MEM_64 in 64bit resource of pci_device/pci_bridge
and make the bus resource only have that bit set when all devices under
it support 64bit prefetchable memory. Use that flag to allocate
resources from that range.

Reported-by: Yannick <yannick.roehlly@free.fr>
Reviewed-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# f79b1b14 27-May-2009 Yu Zhao <yu.zhao@intel.com>

PCI: use fixed-up device class when configuring device

The device class may be changed after the fixup, so re-read the class
value from pci_dev when configuring the device. Otherwise some devices
such as JMicron SATA controller won't work.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 0bb1be3e 16-Apr-2009 Matthew Wilcox <willy@infradead.org>

x86/PCI: Move set_pci_bus_resources_arch_default into arch/x86

Commit 30a18d6c3f1e774de656ebd8ff219d53e2ba4029 introduced a new
function to set the PCI bus resources. Unfortunately, neither the
author, nor the committers seemed to know that we already have somewhere
to do that -- pcibios_fixup_bus(). This patch moves the hook (used only
by the K8 code) into x86-specific code where it should have been in the
first place.

Cc: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 5446a6bd 01-Apr-2009 Alex Chiang <achiang@hp.com>

PCI: annotate pci_rescan_bus as __ref, not __devinit

pci_rescan_bus was annotated as __devinit, which is wrong,
because it will never be part of device initialization.
Howevever, we can't simply drop the annotation, because then we
get section warnings about calling pci_scan_child_bus (which is
correctly marked as __devinit).

pci_rescan_bus will only get built when CONFIG_HOTPLUG is set,
meaning that __devinit is a nop, so we know that pci_scan_child_bus
has not been freed.

Annotate as __ref to silence modpost.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 853346e4 21-Mar-2009 Yu Zhao <yu.zhao@intel.com>

PCI: fix conflict between SR-IOV and config space sizing

New pci_cfg_space_size() needs invalid pdev->class, put it in the
right place in the pci_setup_device().

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 705b1aaa 20-Mar-2009 Alex Chiang <achiang@hp.com>

PCI: Introduce /sys/bus/pci/rescan

This interface allows the user to force a rescan of all PCI buses
in system, and rediscover devices that have been removed earlier.

pci_bus_attrs implementation from Trent Piepho.

Thanks to Vegard Nossum for discovering locking issues with the
sysfs interface.

Cc: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 3ed4fd96 20-Mar-2009 Alex Chiang <achiang@hp.com>

PCI: Introduce pci_rescan_bus()

This API is used by the PCI core to rescan a bus and rediscover
newly added devices.

Over time, it is expected that the various PCI hotplug drivers
will migrate to this interface and away from the old
pci_do_scan_bus() interface.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 74710ded 20-Mar-2009 Alex Chiang <achiang@hp.com>

PCI: always scan child buses

While scanning bridges, we stop our scan if we encounter a bus
that we've seen before, to work around some buggy chipsets. This
is a good idea, but prevents us from fully scanning the PCI bus
at a future time (to find newly hot-added devices, for example).

Change the logic so that we skip _re-adding_ an existing bus
that we've seen before, but also allow the scan to descend to
all child buses.

Now that we're potentially scanning our child buses again, we
also need to be sure not to attempt re-initializing their BARs
so we avoid that.

This patch lays the groundwork to allow the user to issue a
rescan of the PCI bus at any time.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 1b69dfc6 20-Mar-2009 Trent Piepho <xyzzy@speakeasy.org>

PCI: pci_scan_slot() returns newly found devices

pci_scan_slot() has been rewritten to be less complex and will now
return the number of *new* devices found.

Existing callers need not worry because they already assume that
they can't call pci_scan_slot() on an already-scanned slot.

Thus, there is no semantic change for existing callers: returning
newly found devices (this patch) is exactly equal to returning all
found devices (before this patch).

This patch adds some more groundwork to allow us to rescan the
PCI bus during runtime to discover newly added devices.

Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 90bdb311 20-Mar-2009 Trent Piepho <xyzzy@speakeasy.org>

PCI: don't scan existing devices

pci_scan_single_device is supposed to add newly discovered
devices to pci_bus->devices, but doesn't check to see if the
device has already been added. This can cause problems if we ever
want to use this interface to rescan the PCI bus.

If the device is already added, just return it.

Signed-off-by: Trent Piepho <xyzzy@speakeasy.org>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 480b93b7 19-Mar-2009 Yu Zhao <yu.zhao@intel.com>

PCI: centralize device setup code

Move the device setup stuff into pci_setup_device() which will be used
to setup the Virtual Function later.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a28724b0 19-Mar-2009 Yu Zhao <yu.zhao@intel.com>

PCI: reserve bus range for SR-IOV device

Reserve the bus number range used by the Virtual Function when
pcibios_assign_all_busses() returns true.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# d1b054da 19-Mar-2009 Yu Zhao <yu.zhao@intel.com>

PCI: initialize and release SR-IOV capability

If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# dfadd9ed 08-Mar-2009 Yinghai Lu <yinghai@kernel.org>

PCI/x86: detect host bridge config space size w/o using quirks

Many host bridges support a 4k config space, so check them directy
instead of using quirks to add them.

We only need to do this extra check for host bridges at this point,
because only host bridges are known to have extended address space
without also having a PCI-X/PCI-E caps. Other devices with this
property could be done with quirks (if there are any).

As a bonus, we can remove the quirks for AMD host bridges with family
10h and 11h since they're not needed any more.

With this patch, we can get correct pci cfg size of new Intel CPUs/IOHs
with host bridges.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 6a3b3e26 15-Mar-2009 Geert Uytterhoeven <geert@linux-m68k.org>

PCI: Use kzalloc() in pci_create_bus()

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# f92d4e29 16-Feb-2009 Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>

PCI: fix wrong assumption in pci_read_bridge_bases

Current pci_read_bridge_bases() has an assumption that pci_bus->self
is NULL on the pci root bus (It checks pci_bus->self to see if the pci
bus is root bus). But is might not true on some platforms. We must
check pci_bus->parent instead.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# eb9c39d0 17-Dec-2008 Jesse Barnes <jbarnes@virtuousgeek.org>

PCI: set device wakeup capable flag if platform support is present

When PCI devices are initialized, we check whether they support PCI PM
caps and set the device can_wakeup flag if so. However, some devices
may have platform provided wakeup events rather than PCI PME signals, so
we need to set can_wakeup in that case too. Doing so should allow
wakeups from many more devices, especially on cost constrained systems.

Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Joseph Chan <JosephChan@via.com.tw>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 3789fa8a 21-Nov-2008 Yu Zhao <yu.zhao@intel.com>

PCI: allow pci_alloc_child_bus() to handle a NULL bridge

Allow pci_alloc_child_bus() to allocate buses without bridge devices.
Some SR-IOV devices can occupy more than one bus number, but there is no
explicit bridges because that have internal routing mechanism.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 0b400c7e 21-Nov-2008 Yu Zhao <yu.zhao@intel.com>

PCI: export __pci_read_base()

Export __pci_read_base() so it can be used by whole PCI subsystem.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# fde09c6d 21-Nov-2008 Yu Zhao <yu.zhao@intel.com>

PCI: define PCI resource names in an 'enum'

This patch moves all definitions of the PCI resource names to an 'enum',
and also replaces some hard-coded resource variables with symbol
names. This change eases introduction of device specific resources.

Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 63f4898a 07-Dec-2008 Rafael J. Wysocki <rjw@rjwysocki.net>

PCI: handle PCI state saving with interrupts disabled

Since interrupts will soon be disabled at PCI resume time, we need to
pre-allocate memory to save/restore PCI config space (or use GFP_ATOMIC,
but this is safer).

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 1a927133 29-Oct-2008 Kay Sievers <kay.sievers@vrfy.org>

PCI: struct device - replace bus_id with dev_name(), dev_set_name()

This patch is part of a larger patch series which will remove
the "char bus_id[20]" name string from struct device. The device
name is managed in the kobject anyway, and without any size
limitation, and just needlessly copied into "struct device".

To set and read the device name dev_name(dev) and dev_set_name(dev)
must be used. If your code uses static kobjects, which it shouldn't
do, "const char *init_name" can be used to statically provide the
name the registered device should have. At registration time, the
init_name field is cleared, to enforce the use of dev_name(dev) to
access the device name at a later time.

We need to get rid of all occurrences of bus_id in the entire tree
to be able to enable the new interface. Please apply this patch,
and possibly convert any remaining remaining occurrences of bus_id.

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 588235bb53 04-Jan-2009 Mike Travis <travis@sgi.com>

cpumask: update pci_bus_show_cpuaffinity to use new cpumask API

Impact: use new cpumask API to reduce stack usage

Replace the local cpumask_t variable with a pointer to the
const cpumask that needs to be printed.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 29c0177e 13-Dec-2008 Rusty Russell <rusty@rustcorp.com.au>

cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.

Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com


# a491913f 13-Oct-2008 Zhao, Yu <yu.zhao@intel.com>

PCI: remove unused resource assignment in pci_read_bridge_bases()

This cleanup removes the resource assignment in pci_read_bridge_bases()
since it has taken care by pci_alloc_child_bus() when allocating the bus:

/* Set up default resource pointers and names.. */
for (i = 0; i < PCI_BRIDGE_RES_NUM; i++) {
child->resource[i] = &bridge->resource[PCI_BRIDGE_RESOURCES+i];
child->resource[i]->name = child->name;
}

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a1c19894 20-Oct-2008 Benjamin Herrenschmidt <benh@kernel.crashing.org>

PCI: Workaround invalid P2P bridge bus numbers

Some firmware fail to properly configure P2P bridges, leaving them
with invalid bus numbers. In some cases, this happens on some embedded
4xx boards as the result of the kernel allocating different bus space
than the firmware does to host bridges while not setting
pcibios_assign_all_busses() for various reasons. In other cases, it can
just be bogus firmware.

This adds some sanity checking to the PCI probing code. If a bridge is
found whose primary bus number doesn't match the bus it's sitting on,
or whose secondary bus number not strictly above it's primary bus
number, then the bridge bus numbers are deconfigured in the first pass
of pci_scan_bridge() to be re-assigned in the second pass.

Tested-by: "Ayman El-Khashab" <AymanE@tanisys.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# f19aeb1f 03-Oct-2008 Benjamin Herrenschmidt <benh@kernel.crashing.org>

PCI: Add ability to mmap legacy_io on some platforms

This adds the ability to mmap legacy IO space to the legacy_io files
in sysfs on platforms that support it. This will allow to clean up
X to use this instead of /dev/mem for legacy IO accesses such as
those performed by Int10.

While at it I moved pci_create/remove_legacy_files() to pci-sysfs.c
where I think they belong, thus making more things statis in there
and cleaned up some spurrious prototypes in the ia64 pci.h file

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# f393d9b1 11-Oct-2008 Vincent Legoll <vincent.legoll@gmail.com>

PCI: probing debug message uniformization

This patch uniformizes PCI probing debug boot messages with dev_printk()
intead of manual printk()

It changes adress range output from [%llx, %llx] to [%#llx-%#llx], like
in pci_request_region().

For example, it goes from the mixed-style:

PCI: 0000:00:1b.0 reg 10 64bit mmio: [f4280000, f4283fff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold

to uniform:

pci 0000:00:1b.0: reg 10 64bit mmio: [0xf4280000-0xf4283fff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold

This patch has been runtime tested, boot log messages diffed, everything
looks OK.

Acked-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Vincent Legoll <vincent.legoll@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 58c3a727 14-Oct-2008 Yu Zhao <yu.zhao@intel.com>

PCI: support PCIe ARI capability

This patch adds support for PCI Express Alternative Routing-ID
Interpretation (ARI) capability.

The ARI capability extends the Function Number field of the PCI Express
Endpoint by reusing the Device Number which is otherwise hardwired to 0.
With ARI, an Endpoint can have up to 256 functions.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 201de56e 13-Oct-2008 Zhao, Yu <yu.zhao@intel.com>

PCI: centralize the capabilities code in probe.c

This patch centralizes the initialization and release functions of
various PCI capabilities in probe.c, which makes the introduction
of new capability support functions cleaner in the future.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# e354597c 12-Oct-2008 Peter Chubb <peterc@gelato.unsw.edu.au>

PCI: fix 64-vbit prefetchable memory resource BARs

Since patch 6ac665c63dcac8fcec534a1d224ecbb8b867ad59 my infiniband
controller hasn't worked. This is because it has 64-bit prefetchable
memory, which was mistakenly being taken to be 32-bit memory. The
resource flags in this case are PCI_BASE_ADDRESS_MEM_TYPE_64 |
PCI_BASE_ADDRESS_MEM_PREFETCH.

This patch checks only for the PCI_BASE_ADDRESS_MEM_TYPE_64 bit; thus
whether the region is prefetchable or not is ignored. This fixes my
Infiniband.

Reviewed-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 557848c3 13-Oct-2008 Zhao, Yu <yu.zhao@intel.com>

PCI: replace cfg space size (256/4096) by macros.

This is a cleanup that changes all PCI configuration space size
representations to the macros (PCI_CFG_SPACE_SIZE and
PCI_CFG_SPACE_EXP_SIZE). And the macros are also moved from
drivers/pci/probe.c to drivers/pci/pci.h.

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# cef354db 02-Sep-2008 Alex Chiang <achiang@hp.com>

PCI: connect struct pci_dev to struct pci_slot

The introduction of struct pci_slot (f46753c5e354b857b20ab8e0fe7b25)
added a struct pci_slot pointer to struct pci_dev, but we forgot to
associate the two.

Connect the two structs together; the interesting portions of the object
lifetimes are:

- when a new pci_slot is created, connect it to the appropriate
pci_dev's. A single pci_slot may be associated with multiple
pci_dev's, e.g. any multi-function PCI device.

- when a pci_slot is released, look for all the pci_dev's it was
associated with, and set their pci_slot pointers to NULL

- when a pci_dev is created, look for slots to associate with.

Note -- when a pci_dev is released, we don't need to do any bookkeeping,
since pci_slot's do not have pointers to pci_dev's.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 34a2e15e 25-Aug-2008 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: follow lspci device/vendor style

Use "[%04x:%04x]" for PCI vendor/device IDs to follow the format
used by lspci(8).

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 096e6f67 19-Oct-2008 Benjamin Herrenschmidt <benh@kernel.crashing.org>

pci: Use new %pR to print resource ranges

This converts things in drivers/pci to use %pR to printout the
content of a struct resource instead of hand-casted %llx or
other variants.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 99178b03 26-Aug-2008 Greg Kroah-Hartman <gregkh@suse.de>

Driver core: add bus_sort_breadthfirst() function

The PCI core wants to reorder the devices in the bus list. So move this
functionality out of the pci core and into the driver core so that
anyone else can also do this if needed. This also lets us change how
struct device is attached to drivers in the future without messing with
the PCI core.

Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 395a125c 09-Sep-2008 Yinghai Lu <yhlu.kernel@gmail.com>

PCI: re-add debug prints for unmodified BARs

Print out for device BAR values before the kernel tries to update them.
Also make related output use KERN_DEBUG.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 4ca8a772 22-Aug-2008 Johann Felix Soden <johfel@users.sourceforge.net>

PCI: Fix printk warnings in probe.c

The cleaned up resource code in probe.c introduced some warnings:
drivers/pci/probe.c: In function 'pci_read_bridge_bases':
drivers/pci/probe.c:386: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/pci/probe.c:386: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:398: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'resource_size_t'
drivers/pci/probe.c:398: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:434: warning: format '%llx' expects type 'long long unsigned int', but argument 4 has type 'resource_size_t'
drivers/pci/probe.c:434: warning: format '%llx' expects type 'long long unsigned int', but argument 5 has type 'resource_size_t'

So fix them up.

Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# a844158a 06-Aug-2008 Simon Horman <horms@verge.net.au>

PCI: check the return value of device_create_bin_file() in pci_create_bus()

Check the return value of device_create_bin_file in pci_create_bus and
unwind if necessary. Don't propagate error to caller, as failure to create
these files shouldn't prevent PCI from being initialised. Instead, just
log a warning.

Cc: Sven Wegener <sven.wegener@stealer.net>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 149e1637 22-Jul-2008 Shaohua Li <shaohua.li@intel.com>

PCI: disable ASPM on pre-1.1 PCIe devices

Disable ASPM on pre-1.1 PCIe devices, as many of them don't implement it
correctly.

Tested-by: Jack Howarth <howarth@bromo.msbb.uc.edu>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# cc5499c3 28-Jul-2008 Matthew Wilcox <willy@infradead.org>

PCI: handle 64-bit resources better on 32-bit machines

If the kernel is configured to support 64-bit resources on a 32-bit
machine, we can support 64-bit BARs properly. Just change the condition
to check sizeof(resource_size_t) instead of BITS_PER_LONG.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 6ac665c6 28-Jul-2008 Matthew Wilcox <willy@infradead.org>

PCI: rewrite PCI BAR reading code

Factor out the code to read one BAR from the loop in pci_read_bases into
a new function, __pci_read_base. The new code is slightly more
readable, better commented and removes the ifdef.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# eb9d0fe4 06-Jul-2008 Rafael J. Wysocki <rjw@rjwysocki.net>

PCI ACPI: Rework PCI handling of wake-up

* Introduce function acpi_pm_device_sleep_wake() for enabling and
disabling the system wake-up capability of devices that are power
manageable by ACPI.

* Introduce function acpi_bus_can_wakeup() allowing other (dependent)
subsystems to check if ACPI is able to enable the system wake-up
capability of given device.

* Introduce callback .sleep_wake() in struct pci_platform_pm_ops and
for the ACPI PCI 'driver' make it use acpi_pm_device_sleep_wake().

* Introduce callback .can_wakeup() in struct pci_platform_pm_ops and
for the ACPI 'driver' make it use acpi_bus_can_wakeup().

* Move the PME# handlig code out of pci_enable_wake() and split it
into two functions, pci_pme_capable() and pci_pme_active(),
allowing the caller to check if given device is capable of
generating PME# from given power state and to enable/disable the
device's PME# functionality, respectively.

* Modify pci_enable_wake() to use the new ACPI callbacks and the new
PME#-related functions.

* Drop the generic .platform_enable_wakeup() callback that is not
used any more.

* Introduce device_set_wakeup_capable() that will set the
power.can_wakeup flag of given device.

* Rework PCI device PM initialization so that, if given device is
capable of generating wake-up events, either natively through the
PME# mechanism, or with the help of the platform, its
power.can_wakeup flag is set and its power.should_wakeup flag is
unset as appropriate.

* Make ACPI set the power.can_wakeup flag for devices found to be
wake-up capable by it.

* Make the ACPI wake-up code enable/disable GPEs for devices that
have the wakeup.flags.prepared flag set (which means that their
wake-up power has been enabled).

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# eebfcfb5 02-Jul-2008 Greg Kroah-Hartman <gregkh@suse.de>

PCI: handle pci_name() being const

This changes pci_setup_device to handle pci_name() now returning a
constant string.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 8b285ce8 27-Jun-2008 David Howells <dhowells@redhat.com>

PCI: fix pci_setup_device()'s sprinting into a const buffer

Make pci_setup_device() write the bus ID directly into the allotted storage,
rather than using pci_name() as the address as that now returns a const
pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 80ccba11 13-Jun-2008 Bjorn Helgaas <bjorn.helgaas@hp.com>

PCI: use dev_printk when possible

Convert printks to use dev_printk().

I converted pr_debug() to dev_dbg(). Both use KERN_DEBUG and are enabled
only when DEBUG is defined.

I converted printk(KERN_DEBUG) to dev_printk(KERN_DEBUG), not to dev_dbg(),
because dev_dbg() is only enabled when DEBUG is defined.

I converted DBG(KERN_INFO) (only in setup-bus.c) to dev_info(). The DBG()
name makes it sound like debug, but it's been enabled forever, so dev_info()
preserves the previous behavior.

I tried to make the resource assignment formats more consistent, e.g.,
"BAR %d: got res [%#llx-%#llx] bus [%#llx-%#llx] flags %#lx\n"
instead of sometimes using "start-end" and sometimes using "size@start".
I'm not attached to one or the other; I'd just like them consistent.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 9bf8a1a7 23-Jun-2008 Yinghai Lu <yhlu.kernel@gmail.com>

pci: debug extra pci resources range

Signed-off-by: Ingo Molnar <mingo@elte.hu>


# f46753c5 10-Jun-2008 Alex Chiang <achiang@hp.com>

PCI: introduce pci_slot

Currently, /sys/bus/pci/slots/ only exposes hotplug attributes when a
hotplug driver is loaded, but PCI slots have attributes such as address,
speed, width, etc. that are not related to hotplug at all.

Introduce pci_slot as the primary data structure and kobject model.
Hotplug attributes described in hotplug_slot become a secondary
structure associated with the pci_slot.

This patch only creates the infrastructure that allows the separation of
PCI slot attributes and hotplug attributes. In this patch, the PCI
hotplug core remains the only user of this infrastructure, and thus,
/sys/bus/pci/slots/ will still only become populated when a hotplug
driver is loaded.

A later patch in this series will add a second user of this new
infrastructure and demonstrate splitting the task of exposing pci_slot
attributes from hotplug_slot attributes.

- Make pci_slot the primary sysfs entity. hotplug_slot becomes a
subsidiary structure.
o pci_create_slot() creates and registers a slot with the PCI core
o pci_slot_add_hotplug() gives it hotplug capability

- Change the prototype of pci_hp_register() to take the bus and
slot number (on parent bus) as parameters.

- Remove all the ->get_address methods since this functionality is
now handled by pci_slot directly.

[achiang@hp.com: rpaphp-correctly-pci_hp_register-for-empty-pci-slots]
Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: make headers_check happy]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: fix typo in #include]
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Cc: Greg KH <greg@kroah.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 49db1399 12-May-2008 Zhao Yakui <yakui.zhao@intel.com>

PCI: Disable PME during PCI scan

If a device supports #PME and can generate PME events from D0, we may see
superfluous events before a driver is loaded (drivers should only enable PME as
needed), preventing suspend from working if the corresponding GPE was enabled.

Likewise, if the ACPI device has the _PRW object, the _PSW/_DSW object will be
called in order to disable the wakeup functionality. But when it is allowed to
wake up the sleeping state, OSPM will enable it again.

So we should disable PME in the course of scanning PCI devices and enable it
again only when PME events are actually required to be generated from the
requested PCI state (for example, D3_hot or D3_cold). It is also safe to
disable PME again when the PME is disabled for the PCI devices.

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Li Shaohua <shaohua.li@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 70b9f7dc 28-Apr-2008 Yinghai Lu <yhlu.kernel.send@gmail.com>

x86/pci: remove flag in pci_cfg_space_size_ext

so let pci_cfg_space_size call it directly without flag.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>


# 30a18d6c 19-Feb-2008 Yinghai Lu <Yinghai.Lu@Sun.COM>

x86: multi pci root bus with different io resource range, on 64-bit

scan AMD opteron io/mmio routing to make sure every pci root bus get correct
resource range. Thus later pci scan could assign correct resource to device
with unassigned resource.

this can fix a system without _CRS for multi pci root bus.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 0d358f22 19-Feb-2008 Yinghai Lu <Yinghai.Lu@Sun.COM>

driver core: try parent numa_node at first before using default

in the device_add, we try to use use parent numa_node.
need to make sure pci root bus's bridge device numa_node is set.
then we could use device->numa_node direclty for all device.
and don't need to call pcibus_to_node().

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 57741a77 15-Feb-2008 Yinghai Lu <Yinghai.Lu@Sun.COM>

x86_64: set cfg_size for AMD Family 10h in case MMCONFIG

reuse pci_cfg_space_size but skip check pci express and pci-x CAP ID.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7f7b5de2 18-Apr-2008 Adrian Bunk <bunk@kernel.org>

PCI: pci_scan_device() mustn't be __devinit

WARNING: drivers/pci/built-in.o(.text+0x150f): Section mismatch in reference from the function pci_scan_single_device() to the function .devinit.text:pci_scan_device()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# cbd4e055 18-Apr-2008 Adrian Bunk <bunk@kernel.org>

PCI: pci_alloc_child_bus() mustn't be __devinit

WARNING: drivers/pci/built-in.o(.text+0xc4c): Section mismatch in reference from the function pci_add_new_bus() to the function .devinit.text:pci_alloc_child_bus()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 88452565 30-Mar-2008 Ivan Kokshaysky <ink@jurassic.park.msu.ru>

PCI: clean up resource alignment management

Done per Linus' request and suggestions. Linus has explained that
better than I'll be able to explain:

On Thu, Mar 27, 2008 at 10:12:10AM -0700, Linus Torvalds wrote:
> Actually, before we go any further, there might be a less intrusive
> alternative: add just a couple of flags to the resource flags field (we
> still have something like 8 unused bits on 32-bit), and use those to
> implement a generic "resource_alignment()" routine.
>
> Two flags would do it:
>
> - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device
> resources)
>
> - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources
> during probing)
>
> and then the case of both flags zero (or both bits set) would actually be
> "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we
> actually allocate the resource (so that we don't use the "start" field as
> alignment incorrectly when it no longer indicates alignment).
>
> That wouldn't be totally generic, but it would have the nice property of
> automatically at least add sanity checking for that whole "res->start has
> the odd meaning of 'alignment' during probing" and remove the need for a
> new field, and it would allow us to have a generic "resource_alignment()"
> routine that just gets a resource pointer.

Besides, I removed IORESOURCE_BUS_HAS_VGA flag which was unused for ages.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 94e61088 05-Mar-2008 Ben Hutchings <bhutchings@solarflare.com>

PCI: Expose PCI VPD through sysfs

Vital Product Data (VPD) may be exposed by PCI devices in several
ways. It is generally unsafe to read this information through the
existing interfaces to user-land because of stateful interfaces.

This adds:
- abstract operations for VPD access (struct pci_vpd_ops)
- VPD state information in struct pci_dev (struct pci_vpd)
- an implementation of the VPD access method specified in PCI 2.2
(in access.c)
- a 'vpd' binary file in sysfs directories for PCI devices with VPD
operations defined

It adds a probe for PCI 2.2 VPD in pci_scan_device() and release of
VPD state in pci_release_dev().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 7d715a6c 24-Feb-2008 Shaohua Li <shaohua.li@intel.com>

PCI: add PCI Express ASPM support

PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.

This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.

In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.

Note: some devices might not work well with aspm, either because chipset
issue or device issue. The patch provide API (pci_disable_link_state),
driver can disable ASPM for specific device.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# cb3576fa 08-Feb-2008 Gary Hade <garyhade@us.ibm.com>

PCI: Include PCI domain in PCI bus names on x86/x86_64

The PCI bus names included in /proc/iomem and /proc/ioports are
of the form 'PCI Bus #XX' where XX is the bus number. This patch
changes the naming to 'PCI Bus XXXX:YY' where XXXX is the domain
number and YY is the bus number. For example, PCI bus 14 in
domain 0 will show as 'PCI Bus 0000:14' instead of 'PCI Bus #14'.
This change makes the naming consistent with other architectures
such as ia64 where multiple PCI domain support has been around
longer.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 5ff580c1 14-Feb-2008 Greg Kroah-Hartman <gregkh@suse.de>

PCI: remove global list of PCI devices

This patch finally removes the global list of PCI devices. We are
relying entirely on the list held in the driver core now, and do not
need a separate "shadow" list as no one uses it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 8a1bc901 14-Feb-2008 Greg Kroah-Hartman <gregkh@suse.de>

PCI: add is_added flag to struct pci_dev

This lets us check if the device is really added to the driver core or
not, which is what we need when walking some of the bus lists. The flag
is there in anticipation of getting rid of the other PCI device list,
which is what we used to check in this situation.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 70308923 13-Feb-2008 Greg Kroah-Hartman <gregkh@suse.de>

PCI: make no_pci_devices() use the pci_bus_type list

no_pci_devices() should use the driver core list of PCI devices, not our
"separate" one.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 39106dcf 08-Apr-2008 Mike Travis <travis@sgi.com>

cpumask: use new cpus_scnprintf function

* Cleaned up references to cpumask_scnprintf() and added new
cpulist_scnprintf() interfaces where appropriate.

* Fix some small bugs (or code efficiency improvments) for various uses
of cpumask_scnprintf.

* Clean up some checkpatch errors.

Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 0ab2b57f 17-Feb-2008 Sam Ravnborg <sam@ravnborg.org>

PCI: fix section mismatch warning in pci_scan_child_bus

Fix following warning:
WARNING: vmlinux.o(.text+0x47bdb1): Section mismatch in reference from the function pci_scan_child_bus() to the function .devinit.text:pcibios_fixup_bus()

We had plenty of functions that could be annotated __devinit but due to
the former restriction that exported symbols could not be annotated
they were not so. So annotate these function and fix the references
from the pci/hotplug/* code to silence the resuting warnings.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 59fc67de 04-Feb-2008 FUJITA Tomonori <tomof@acm.org>

iommu sg merging: PCI: add dma segment boundary support

This adds PCI's accessor for segment_boundary_mask in device_dma_parameters.

The default segment_boundary is set to 0xffffffff, same to the block layer's
default value (and the scsi mid layer uses the same value).

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 4d57cdfa 04-Feb-2008 FUJITA Tomonori <tomof@acm.org>

iommu sg merging: PCI: add device_dma_parameters support

This adds struct device_dma_parameters in struct pci_dev and properly
sets up a pointer in struct device.

The default max_segment_size is set to 64K, same to the block layer's
default value.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Mostly-acked-by: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 451124a7 02-Feb-2008 Sam Ravnborg <sam@ravnborg.org>

PCI: fix 4x section mismatch warnings

The following warnings were issued during build of
drivers/pci with an allyesconfig build:
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0xdaf): Section mismatch in reference from the function pci_add_new_bus() to the function .devinit.text:pci_alloc_child_bus()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x15e2): Section mismatch in reference from the function pci_scan_single_device() to the function .devinit.text:pci_scan_device()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x1b0c5): Section mismatch in reference from the function pci_bus_assign_resources() to the function .devinit.text:pci_setup_bridge()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x1b32d): Section mismatch in reference from the function pci_bus_size_bridges() to the function .devinit.text:pci_bus_size_cardbus()

Investigating each case closer it looked like all
referred functions are only used in the init phase
or during hotplug.
So to avoid wasting too much memory in the non-hotplug
case the simpler fix was to allow the fuctions to
use code/data from the __devinit sections.
This was done in all four case by adding the __ref
annotation.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 4105717b 02-Feb-2008 Sam Ravnborg <sam@ravnborg.org>

PCI: fix section mismatch warnings referring to pci_do_scan_bus

Fix following warnings:
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0xb054): Section mismatch in reference from the function cpci_configure_slot() to the function .devinit.text:pci_do_scan_bus()
WARNING: o-x86_64/drivers/pci/built-in.o(.text+0x153ab): Section mismatch in reference from the function shpchp_configure_device() to the function .devinit.text:pci_do_scan_bus()
WARNING: o-x86_64/drivers/pci/built-in.o(__ksymtab+0xc0): Section mismatch in reference from the variable __ksymtab_pci_do_scan_bus to the function .devinit.text:pci_do_scan_bus()

PCI hotplug were the only user of pci_do_scan_bus()
so moving this function to a separate file that is build
only when we enable CONFIG_HOTPLUG_PCI.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# cc3a1378 02-Feb-2008 Greg Kroah-Hartman <gregkh@suse.de>

Revert "PCI: PCIE ASPM support"

This reverts commit 6c723d5bd89f03fc3ef627d50f89ade054d2ee3b.

It caused build errors on non-x86 platforms, config file confusion, and
even some boot errors on some x86-64 boxes. All around, not quite ready
for prime-time :(

Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# fd7d1ced 22-May-2007 Greg Kroah-Hartman <gregkh@suse.de>

PCI: make pci_bus a struct device

This moves the pci_bus class device to be a real struct device and at
the same time, place it in the device tree in the correct location.

Note, the old "bridge" symlink is now gone, but this was a non-standard
link and no userspace program used it. If you need to determine the
device that the bus is on, follow the standard device symlink, or walk
up the device tree.


Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 6c723d5b 23-Jan-2008 Shaohua Li <shaohua.li@intel.com>

PCI: PCIE ASPM support

PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.

This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state
and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.

In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# a6f29a98 19-Nov-2007 Joe Perches <joe@perches.com>

PCI: Add missing "space" in printk messages

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 943e6c0d 21-Nov-2007 Adrian Bunk <bunk@kernel.org>

PCI: remove additional pci_scan_child_bus() prototype

There's already a prototype for pci_scan_child_bus() at the correct place in
pci.h, so there's no reason for an additional one.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# b73e9687 21-Nov-2007 Adrian Bunk <bunk@kernel.org>

PCI: always export pci_scan_single_device

This patch fixes the following build error with CONFIG_HOTPLUG=n:

MODPOST 2137 modules
ERROR: "pci_scan_single_device" [drivers/edac/i82875p_edac.ko] undefined!

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# eb003ec2 26-Oct-2007 Adrian Bunk <bunk@kernel.org>

PCI: drivers/pci/: remove unused exports

This patch removes the following unused exports:
- remove the following unused EXPORT_SYMBOL's:
- pci-acpi.c: pci_osc_support_set
- proc.c: pci_proc_detach_bus
- remove the following unused EXPORT_SYMBOL_GPL's:
- bus.c: pci_walk_bus
- probe.c: pci_create_bus
- setup-res.c: pci_claim_resource

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# b249072e 01-Nov-2007 Greg Kroah-Hartman <gregkh@suse.de>

driver core: add way to get to bus device klist

This allows an easier way to get to the device klist associated with a
struct bus_type (you have three to choose from...) This will make it
easier to move these fields to be dynamic in a future patch.

The only user of this is the PCI core which horribly abuses this
interface to rearrange the order of the pci devices. This should be
done using the existing bus device walking functions, but that's left
for future patches.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# ad7edfe0 27-Dec-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

[PCI] Do not enable CRS Software Visibility by default

It appears that some PCI-E bridges do the wrong thing in the presense of
CRS Software Visibility and MMCONFIG. In particular, it looks like an
ATI bridge (device ID 7936) will return 0001 in the vendor ID field of
any bridged devices indefinitely.

Not enabling CRS SV avoids the problem, and as we currently do not
really make good use of the feature anyway (we just time out rather than
do any threaded discovery as suggested by the CRS specs), we're better
off just not enabling it.

This should fix a slew of problem reports with random devices (generally
graphics adapters or fairly high-performance networking cards, since it
only affected PCI-E) not getting properly recognized on these AMD systems.

If we really want to use CRS-SV, we may end up eventually needing a
whitelist of systems where this should be enabled, along with some kind
of "pcibios_enable_crs()" query to call the system-specific code.

Suggested-by: Loic Prylli <loic@myri.com>
Tested-by: Kai Ruhnau <kai@tragetaschen.dyndns.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# bb446093 11-Dec-2007 Gary Hade <garyhade@us.ibm.com>

PCI: Restore PCI expansion ROM P2P prefetch window creation

Restore PCI expansion ROM P2P prefetch window creation.

This patch reverts previous "Avoid creating P2P prefetch
window for expansion ROMs" change due to regressions that
were spotted on some systems.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# af1bff4f 10-Dec-2007 Linus Torvalds <torvalds@woody.linux-foundation.org>

Revert "PCI: fix IDE legacy mode resources"

This reverts commit fd6e732186ab522c812ab19c2c5e5befb8ec8115, which
helped up things on MIPS, but was wrong for everything else. As Ralf
Baechle puts it:

"It seems the whole MIPS resource managment is complicated enough (out
of necessity) that only a few people actually grok it. Ioports being
actually memory mapped on MIPS only makes the confusion worse, sigh."

Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Alan Cox <alan@redhat.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 994a65e2 21-Oct-2007 Keshavamurthy, Anil S <anil.s.keshavamurthy@intel.com>

Intel IOMMU: PCI generic helper function

When devices are under a p2p bridge, upstream transactions get replaced by the
device id of the bridge as it owns the PCIE transaction. Hence its necessary
to setup translations on behalf of the bridge as well. Due to this limitation
all devices under a p2p share the same domain in a DMAR.

We just cache the type of device, if its a native PCIe device
or not for later use.

[akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 11949255 08-Oct-2007 Gary Hade <garyhade@us.ibm.com>

PCI: modify PCI bridge control ISA flag for clarity

Modify PCI Bridge Control ISA flag for clarity

This patch changes PCI_BRIDGE_CTL_NO_ISA to PCI_BRIDGE_CTL_ISA
and modifies it's clarifying comment and locations where used.
The change reduces the chance of future confusion since it makes
the set/unset meaning of the bit the same in both the bridge
control register and bridge_ctl field of the pci_bus struct.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# fd64cb46 03-Oct-2007 Gary Hade <garyhade@us.ibm.com>

PCI: avoid P2P prefetch window for expansion ROMs

Avoid creating P2P prefetch window for expansion ROMs

Because of the future possibility that P2P prefetch windows will contain
address ranges above 4GB some BIOSes are providing space in the P2P
non-prefetch windows for expansion ROMs. This is due to expansion ROM
BAR 32-bit limitation. When expansion ROM BARs without BIOS assigned
address(es) are currently found behind a P2P bridge, the kernel attempts
to create a P2P prefetch window for them even though space for them has
already been provided in the non-prefetch window. _CRS on some systems
with certain resource conservation conscious BIOSes may not provide the
extra 1MB or more memory resource needed for the expansion ROM motivated
prefetch window causing resource allocation errors.

This change corrects the problem by removing IORESOURCE_PREFETCH from
the expansion ROM flags initialization. It also removes
IORESOURCE_CACHEABLE which seems inappropriate if only non-cacheable
memory is available.

Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 036fff4c 03-Oct-2007 Gary Hade <garyhade@us.ibm.com>

PCI: skip ISA ioresource alignment on some systems

Skip ISA ioresource alignment on some systems

To conserve limited PCI i/o resource on some IBM multi-node systems, the
BIOS allocates (via _CRS) and expects the kernel to use addresses in
ranges currently excluded by pcibios_align_resource() [i386/pci/i386.c].
This change allows the kernel to use the currently excluded address
ranges on the IBM x3800, x3850, and x3950.

Signed-off-by: Gary Hade <gary.hade@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# fd6e7321 02-Oct-2007 Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>

PCI: fix IDE legacy mode resources

I got the following error on MIPS Cobalt.

PCI: Unable to reserve I/O region #1:8@f00001f0 for device 0000:00:09.1
pata_via 0000:00:09.1: failed to request/iomap BARs for port 0 (errno=-16)
PCI: Unable to reserve I/O region #3:8@f0000170 for device 0000:00:09.1
pata_via 0000:00:09.1: failed to request/iomap BARs for port 1 (errno=-16)
pata_via 0000:00:09.1: no available native port

The legacy mode IDE resources set the following order.

pci_setup_device()
Legacy mode ATA controllers have fixed addresses.
IDE resources: 0x1F0-0x1F7, 0x3F6, 0x170-0x177, 0x376
|
V
pcibios_fixup_bus()
MIPS Cobalt PCI bus regions have the -0x10000000 offset from PCI resources.
pcibios_fixup_bus() fix PCI bus regions.
0x1F0 - 0x10000000 = 0xF00001F0
|
V
ata_pci_init_one()
PCI: Unable to reserve I/O region #1:8@f00001f0 for device 0000:00:09.1

In some architectures, PCI bus regions have the offset from PCI resources.
For this reason, pci_setup_device() should set PCI bus regions to
dev->resource[].

[akpm@linux-foundation.org: use struct initialiser]
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e365c3e7 23-Aug-2007 Ralf Baechle <ralf@linux-mips.org>

PCI: remove devinit from pci_read_bridge_bases

On MIPS with PCI && !HOTPLUG, I'm currently getting the following modpost
warning:

MODPOST vmlinux.o
WARNING: vmlinux.o(.text+0x1ce128): Section mismatch: reference to .init.text:pci_read_bridge_bases (between 'pcibios_fixup_bus' and 'pcibios_enable_device')

On MIPS I have the call chains pci_scan_child_bus -> pcibios_fixup_bus ->
pci_read_bridge_bases. pci_scan_child_bus can't be __devinit because it
it is an exported symbol, thus pcibios_fixup_bus and pci_read_bridge_bases
can't be either.

For some reason I don't see this issue on x86; I blame compiler differences.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# d55bef51 30-Jul-2007 Bernhard Kaindl <bk@suse.de>

PCI: lets kill the 'PCI hidden behind bridge' message

Adrian Bunk wrote:
> Alois Nešpor wrote
>> PCI: Bus #0b (-#0e) is hidden behind transparent bridge #0a (-#0b) (try 'pci=assign-busses')
>> Please report the result to linux-kernel to fix this permanently"
>>
>> dmesg:
>> "Yenta: Raising subordinate bus# of parent bus (#0a) from #0b to #0e"
>> without pci=assign-busses and nothing with pci=assign-busses.
>
> Bernhard?

Ok, lets kill the message. As Alois Nešpor also saw, that's fixed up by Yenta,
so PCI does not have to warn about it. PCI could still warn about it if
is_cardbus is 0 in that instance of pci_scan_bridge(), but so far I have
not seen a report where this would have been the case so I think we can
spare the kernel of that check (removes ~300 lines of asm) unless debugging
is done.

History: The whole check was added in the days before we had the fixup
for this in Yenta and pci=assign-busses was the only way to get CardBus
cards detected on many (not all) of the machines which give this warning.

In theory, there could be cases when this warning would be triggered and
it's not cardbus, then the warning should still apply, but I think this
should only be the case when working on a completely broken PCI setup,
but one may have already enabled the debug code in drivers/pci and the
patched check would then trigger.

I do not sign this off yet because it's completely untested so far, but
everyone is free to test it (with the #ifdef DEBUG replaced by #if 1 and
pr_debug( changed to printk(.

We may also dump the whole check (remove everything within the #ifdef from
the source) if that's perferred.

On Alois Nešpor's machine this would then (only when debugging) this message:

"PCI: Bus #0b (-#0e) is partially hidden behind transparent bridge #0a (-#0b)"

"partially" should be in the message on his machine because #0b of #0b-#0e
is reachable behind #0a-#0b, but not #0c-#0e.

But that differentiation is now moot anyway because the fixup in Yenta takes
care of it as far as I could see so far, which means that unless somebody
is debugging a totally broken PCI setup, this message is not needed anymore,
not even for debugging PCI.


Ok, here the patch with the following changes:

* Refined to say that the bus is only partially hidden when the parent
bus numbers are not totally way off (outside of) the child bus range
* remove the reference to pci=assign-busses and the plea to report it

We could add a pure source code-only comment to keep a reference to
pci=assign-busses the in case when this is triggered by someone who
is debugging the cause of this message and looking the way to solve it.

From: Bernhard Kaindl <bk@suse.de>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# ed4aaadb 16-Jul-2007 Zhang, Yanmin <yanmin_zhang@linux.intel.com>

fix jvc cdrom drive lockup

Before calling init_hwif_default, ide_unregister gets lock ide_lock and
disables irq. init_hwif_default calls ide_default_io_base which calls
pci_get_device and later pci_get_subsys tries to apply for semaphore
pci_bus_sem and goes to sleep.

Mostly, pci_get_device should be called when irq is turned on.

ide_default_io_base just needs find if list pci_devices is empty.

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7b595756 13-Jun-2007 Tejun Heo <htejun@gmail.com>

sysfs: kill unnecessary attribute->owner

sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# b8a3a521 08-Jun-2007 Auke Kok <auke-jan.h.kok@intel.com>

PCI: read revision ID by default

Currently there are 97 occurrences where drivers need the pci
revision ID. We can do this once for all devices. Even the pci
subsystem needs the revision several times for quirks. The extra
u8 member pads out nicely in the pci_dev struct.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# a23adb5b 07-Jun-2007 Greg Kroah-Hartman <gregkh@suse.de>

PCI: point people to Bernhard instead of the linux-kernel list

Back in commit 8c4b2cf9af9b4ecc29d4f0ec4ecc8e94dc4432d7, Bernhard said
that he would fix up all instances of when this message happens. So
point people at him instead of the linux-kernel list which can not fix
things up.

Cc: Bernhard Kaindl <bk@suse.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 4aa9bc95 05-Apr-2007 Michael Ellerman <michael@ellerman.id.au>

MSI: Use a list instead of the custom link structure

The msi descriptors are linked together with what looks a lot like
a linked list, but isn't a struct list_head list. Make it one.

The only complication is that previously we walked a list of irqs, and
got the descriptor for each with get_irq_msi(). Now we have a list of
descriptors and need to get the irq out of it, so it needs to be in the
actual struct msi_desc. We use 0 to indicate no irq is setup.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# bab41e9b 05-Apr-2007 Michael Ellerman <michael@ellerman.id.au>

PCI: Convert to alloc_pci_dev()

Convert code that allocs a struct pci_dev to use alloc_pci_dev().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 65891215 05-Apr-2007 Michael Ellerman <michael@ellerman.id.au>

PCI: Create alloc_pci_dev(), the one true way to create a struct pci_dev

There are currently several places in the kernel where we kmalloc()
a struct pci_dev and start initialising it. It'd be preferable to
have an allocator so we can ensure the pci_dev is correctly initialised
in one place.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 96bde06a 26-Mar-2007 Sam Ravnborg <sam@ravnborg.org>

pci: do not mark exported functions as __devinit

Functions marked __devinit will be removed after kernel init. But being
exported they are potentially called by a module much later.

So the safer choice seems to be to keep the function even in the non
CONFIG_HOTPLUG case.

This silence the follwoing section mismatch warnings:
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_add_device from __ksymtab_gpl between '__ksymtab_pci_bus_add_device' (at offset 0x20) and '__ksymtab_pci_walk_bus'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_create_bus from __ksymtab_gpl between '__ksymtab_pci_create_bus' (at offset 0x40) and '__ksymtab_pci_stop_bus_device'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_max_busnr from __ksymtab_gpl between '__ksymtab_pci_bus_max_busnr' (at offset 0xc0) and '__ksymtab_pci_assign_resource_fixed'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_claim_resource from __ksymtab_gpl between '__ksymtab_pci_claim_resource' (at offset 0xe0) and '__ksymtab_pcie_port_bus_type'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_add_devices from __ksymtab between '__ksymtab_pci_bus_add_devices' (at offset 0x70) and '__ksymtab_pci_bus_alloc_resource'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_scan_bus_parented from __ksymtab between '__ksymtab_pci_scan_bus_parented' (at offset 0x90) and '__ksymtab_pci_root_buses'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_assign_resources from __ksymtab between '__ksymtab_pci_bus_assign_resources' (at offset 0x4d0) and '__ksymtab_pci_bus_size_bridges'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_size_bridges from __ksymtab between '__ksymtab_pci_bus_size_bridges' (at offset 0x4e0) and '__ksymtab_pci_setup_cardbus'

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 01abc2aa 23-Apr-2007 Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>

Revert "adjust legacy IDE resource setting (v2)"

This reverts commit ed8ccee0918ad063a4741c0656fda783e02df627.

It causes hang on boot for some users and we don't yet know why:

http://bugzilla.kernel.org/show_bug.cgi?id=7562

http://lkml.org/lkml/2007/4/20/404
http://lkml.org/lkml/2007/3/25/113

Just reverse it for 2.6.21-final, having broken X server is somehow
better than unbootable system.

Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>


# ed8ccee0 03-Mar-2007 Jan Beulich <jbeulich@novell.com>

adjust legacy IDE resource setting (v2)

The change to force legacy mode IDE channels' resources to fixed non-zero
values confuses (at least some versions of) X, because the values reported
by the kernel and those readable from PCI config space aren't consistent
anymore. Therefore, this patch arranges for the respective BARs to also
get updated if possible.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>


# 0fcfdabb 25-Jan-2007 Michael Ellerman <michael@ellerman.id.au>

MSI: Remove pci_scan_msi_device()

pci_scan_msi_device() doesn't do anything anymore, so remove it.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 07eddf3d 29-Nov-2006 Yinghai Lu <yinghai.lu@amd.com>

PCI: check szhi when sz is 0 when 64 bit iomem bigger than 4G

For pci mem resource that size is bigger than 4G, the sz returned by
pc_size will be 0.
So that resource is skipped, and register contained hi address will be
treated as another 32bit resource. We need to use sz64 and pci_sz64 for
64 bit resource for clear logical. Typical usages for this: Opteron
system with co-processor and the co-processor could take more than 4G
RAM as pre-fetchable mem resource.


Signed-off-by: Yinghai Lu <yinghai.lu@amd.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 76e6a1d6 29-Dec-2006 Randy Dunlap <randy.dunlap@oracle.com>

[PATCH] pci/probe: fix macro that confuses kernel-doc

Don't have macros between a function's kernel-doc block and the function
definition. This is not valid for kernel-doc.

Warning(/var/linsrc/linux-2.6.20-rc1-git8//drivers/pci/probe.c:653): No description found for parameter 'IORESOURCE_PCI_FIXED'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# fb0f2b40 19-Dec-2006 Ralf Baechle <ralf@linux-mips.org>

PCI legacy resource fix

Since commit 368c73d4f689dae0807d0a2aa74c61fd2b9b075f the kernel will try
to update the non-writeable BAR registers 0..3 of PIIX4 IDE adapters if
pci_assign_unassigned_resources() is used to do full resource assignment of
the bus. This fails because in the PIIX4 these BAR registers have
implicitly assumed values and read back as zero; it used to work because
the kernel used to just write zero to that register the read back value did
match what was written.

The fix is a new resource flag IORESOURCE_PCI_FIXED used to mark a resource
as non-movable. This will also be useful to keep other import system
resources from being moved around - for example system consoles on PCI
busses.

[akpm@osdl.org: cleanup]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 87348136 06-Dec-2006 Christoph Hellwig <hch@lst.de>

[PATCH] add numa node information to struct device

For node-aware skb allocations we need information about the node in struct
net_device or struct device. Davem suggested to put it into struct device
which this patch does.

In particular:

- struct device gets a new int numa_node member if CONFIG_NUMA is set
- there are two new helpers, dev_to_node and set_dev_node to
transparently deal with the non-numa case
- for pci devices the node-info is set to the value we get from
pcibus_to_node.

Note that for some architectures pcibus_to_node doesn't work yet at the time
we call it currently. This is harmless and will just mean skb allocations
aren't node-local on this architectures until the implementation of
pcibus_to_node on these architectures have been updated (There are patches for
x86 and x86_64 floating around)

[akpm@osdl.org: cleanup]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 368c73d4 03-Oct-2006 Alan Cox <alan@lxorguk.ukuu.org.uk>

PCI: quirks: fix the festering mess that claims to handle IDE quirks

The number of permutations of crap we do is amazing and almost all of it
has the wrong effect in 2.6.

At the heart of this is the PCI SFF magic which says that compatibility
mode PCI IDE controllers use ISA IRQ routing and hard coded addresses
not the BAR values. The old quirks variously clears them, sets them,
adjusts them and then IDE ignores the result.

In order to drive all this garbage out and to do it portably we need to
handle the SFF rules directly and properly. Because we know the device
BAR 0-3 are not used in compatibility mode we load them with the values
that are implied (and indeed which many controllers actually
thoughtfully put there in this mode anyway).

This removes special cases in the IDE layer and libata which now knows
that bar 0/1/2/3 always contain the correct address. It means our
resource allocation map is accurate from boot, not "mostly accurate"
after ide is loaded, and it shoots lots of code. There is also lots more
code and magic constant knowledge to shoot once this is in and settled.

Been in my test tree for a while both with drivers/ide and with libata.
Wants some -mm shakedown in case I've missed something dumb or there are
corner cases lurking.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 6b4b78fe 29-Sep-2006 Matt Domsch <Matt_Domsch@dell.com>

PCI: optionally sort device lists breadth-first

Problem:
New Dell PowerEdge servers have 2 embedded ethernet ports, which are
labeled NIC1 and NIC2 on the chassis, in the BIOS setup screens, and
in the printed documentation. Assuming no other add-in ethernet ports
in the system, Linux 2.4 kernels name these eth0 and eth1
respectively. Many people have come to expect this naming. Linux 2.6
kernels name these eth1 and eth0 respectively (backwards from
expectations). I also have reports that various Sun and HP servers
have similar behavior.


Root cause:
Linux 2.4 kernels walk the pci_devices list, which happens to be
sorted in breadth-first order (or pcbios_find_device order on i386,
which most often is breadth-first also). 2.6 kernels have both the
pci_devices list and the pci_bus_type.klist_devices list, the latter
is what is walked at driver load time to match the pci_id tables; this
klist happens to be in depth-first order.

On systems where, for physical routing reasons, NIC1 appears on a
lower bus number than NIC2, but NIC2's bridge is discovered first in
the depth-first ordering, NIC2 will be discovered before NIC1. If the
list were sorted breadth-first, NIC1 would be discovered before NIC2.

A PowerEdge 1955 system has the following topology which easily
exhibits the difference between depth-first and breadth-first device
lists.

-[0000:00]-+-00.0 Intel Corporation 5000P Chipset Memory Controller Hub
+-02.0-[0000:03-08]--+-00.0-[0000:04-07]--+-00.0-[0000:05-06]----00.0-[0000:06]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC2, 2.4 kernel name eth1, 2.6 kernel name eth0)
+-1c.0-[0000:01-02]----00.0-[0000:02]----00.0 Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (labeled NIC1, 2.4 kernel name eth0, 2.6 kernel name eth1)


Other factors, such as device driver load order and the presence of
PCI slots at various points in the bus hierarchy further complicate
this problem; I'm not trying to solve those here, just restore the
device order, and thus basic behavior, that 2.4 kernels had.


Solution:

The solution can come in multiple steps.

Suggested fix #1: kernel
Patch below optionally sorts the two device lists into breadth-first
ordering to maintain compatibility with 2.4 kernels. It adds two new
command line options:
pci=bfsort
pci=nobfsort
to force the sort order, or not, as you wish. It also adds DMI checks
for the specific Dell systems which exhibit "backwards" ordering, to
make them "right".


Suggested fix #2: udev rules from userland
Many people also have the expectation that embedded NICs are always
discovered before add-in NICs (which this patch does not try to do).
Using the PCI IRQ Routing Table provided by system BIOS, it's easy to
determine which PCI devices are embedded, or if add-in, which PCI slot
they're in. I'm working on a tool that would allow udev to name
ethernet devices in ascending embedded, slot 1 .. slot N order,
subsort by PCI bus/dev/fn breadth-first. It'll be possible to use it
independent of udev as well for those distributions that don't use
udev in their installers.

Suggested fix #3: system board routing rules
One can constrain the system board layout to put NIC1 ahead of NIC2
regardless of breadth-first or depth-first discovery order. This adds
a significant level of complexity to board routing, and may not be
possible in all instances (witness the above systems from several
major manufacturers). I don't want to encourage this particular train
of thought too far, at the expense of not doing #1 or #2 above.


Feedback appreciated. Patch tested on a Dell PowerEdge 1955 blade
with 2.6.18.

You'll also note I took some liberty and temporarily break the klist
abstraction to simplify and speed up the sort algorithm. I think
that's both safe and appropriate in this instance.


Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# b19441af 28-Aug-2006 Greg Kroah-Hartman <gregkh@suse.de>

PCI: fix __must_check warnings

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 82081797 10-Jul-2006 Linas Vepstas <linas@austin.ibm.com>

[PATCH] pci: initialize struct pci_dev.error_state

The pci channel state is currently uninitialized, thus there are two ways
of indicating that "everything's OK": 0 and 1. This is a bit of a burden.

If a devce driver wants to check if the pci channel is in a working or a
disconnected state, the driver writer must perform checks similar to

if((pdev->error_state != 0) &&
(pdev->error_state != pci_channel_io_normal)) {
whatever();
}

which is rather akward. The first check is needed because stuct pci_dev is
inited to all-zeros. The scond is needed because the error recovery will
set the state to pci_channel_io_normal (which is not zero).

This patch fixes this awkwardness.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# d71374da 01-Jun-2006 Zhang Yanmin <yanmin.zhang@intel.com>

[PATCH] PCI: fix race with pci_walk_bus and pci_destroy_dev

pci_walk_bus has a race with pci_destroy_dev. When cb is called
in pci_walk_bus, pci_destroy_dev might unlink the dev pointed by next.
Later on in the next loop, pointer next becomes NULL and cause
kernel panic.

Below patch against 2.6.17-rc4 fixes it by changing pci_bus_lock (spin_lock)
to pci_bus_sem (rw_semaphore).

Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# ea28502d 09-Jun-2006 Bjorn Helgaas <bjorn.helgaas@hp.com>

[PATCH] PCI: fix to pci ignore pre-set 64-bit bars on 32-bit platforms

When we detect a 64-bit pre-set address in a BAR on a 32-bit platform,
we disable it and treat it as if it had been unset, thus allowing the
general address assignment code to assign a new address to it when the
device is enabled. This can happen either if the firmware assigns
64-bit addresses; additionally, some cards have been found "in the
wild" which do not come out of reset with all the BAR registers set to
zero.

Unfortunately, the patch that implemented this tested the low part of
the address instead of the high part of the address. This patch fixes
that.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 17d6dc8f 18-Apr-2006 H. Peter Anvin <hpa@c2micro.com>

[PATCH] PCI: Ignore pre-set 64-bit BARs on 32-bit platforms

[pci] Ignore pre-set 64-bit BARs on 32-bit platforms

Currently, Linux always rejects a device which has a pre-set 64-bit
address on a 32-bit platform. On systems which do not do PCI
initialization in firmware, this causes some devices which don't
correctly power up with all BARs zero to fail.

This patch makes the kernel automatically zero out such an address
(thus treating it as if it had not been set at all, meaning it will
assign an address if necessary).

I have done this only for devices, not bridges. It seems potentially
hazardous to do for bridges.

Signed-off-by: H. Peter Anvin <hpa@c2micro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# f5afe806 28-Feb-2006 Eric Sesterhenn <snakebyte@gmx.de>

[PATCH] PCI: kzalloc() conversion in drivers/pci

this patch converts drivers/pci to kzalloc usage.
Compile tested with allyes config.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 8c4b2cf9 18-Feb-2006 Bernhard Kaindl <bk@suse.de>

[PATCH] PCI: PCI/Cardbus cards hidden, needs pci=assign-busses to fix

"In some cases, especially on modern laptops with a lot of PCI and cardbus
bridges, we're unable to assign correct secondary/subordinate bus numbers
to all cardbus bridges due to BIOS limitations unless we are using
"pci=assign-busses" boot option." -- Ivan Kokshaysky (from a patch comment)

Without it, Cardbus cards inserted are never seen by PCI because the parent
PCI-PCI Bridge of the Cardbus bridge will not pass and translate Type 1 PCI
configuration cycles correctly and the system will fail to find and
initialise the PCI devices in the system.

Reference: PCI-PCI Bridges: PCI Configuration Cycles and PCI Bus Numbering:
http://www.science.unitn.it/~fiorella/guidelinux/tlk/node72.html

The reason for this is that:
``All PCI busses located behind a PCI-PCI bridge must reside between the
secondary bus number and the subordinate bus number (inclusive).''

"pci=assign-busses" makes pcibios_assign_all_busses return 1 and this
turns on PCI renumbering during PCI probing.

Alan suggested to use DMI automatically set assign-busses on problem systems.

The only question for me was where to put it. I put it directly before
scanning PCI bus into pcibios_scan_root() because it's called from legacy,
acpi and numa and so it can be one place for all systems and configurations
which may need it.

AMD64 Laptops are also affected and fixed by assign-busses, and the code is
also incuded from arch/x86_64/pci/ that place will also work for x86_64
kernels, I only ifdef'-ed the x86-only Laptop in this example.

Affected and known or assumed to be fixed with it are (found by googling):

* ASUS Z71V and L3s
* Samsung X20
* Compaq R3140us and all Compaq R3000 series laptops with TI1620 Controller,
also Compaq R4000 series (from a kernel.org bugreport)
* HP zv5000z (AMD64 3700+, known that fixup_parent_subordinate_busnr fixes it)
* HP zv5200z
* IBM ThinkPad 240
* An IBM ThinkPad (1.8 GHz Pentium M) debugged by Pavel Machek
gives the correspondig message which detects the possible problem.
* MSI S260 / Medion SIM 2100 MD 95600

The patch also expands the "try pci=assign-busses" warning so testers will
help us to update the DMI table.

Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 6e325a62 14-Feb-2006 Michael S. Tsirkin <mst@mellanox.co.il>

[PATCH] PCI: make MSI quirk inheritable from the pci bus

It turns out AMD 8131 quirk only affects MSI for devices behind the 8131 bridge.
Handle this by adding a flags field in pci_bus, inherited from parent to child.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# bbe8f9a3 14-Feb-2006 Ralf Baechle <ralf@linux-mips.org>

[PATCH] PCI: Avoid leaving MASTER_ABORT disabled permanently when returning from pci_scan_bridge.

> On Mon, Feb 13, 2006 at 05:13:21PM -0800, David S. Miller wrote:
> >
> > In drivers/pci/probe.c:pci_scan_bridge(), if this is not the first
> > pass (pass != 0) we don't restore the PCI_BRIDGE_CONTROL_REGISTER and
> > thus leave PCI_BRIDGE_CTL_MASTER_ABORT off:
> >
> > int __devinit pci_scan_bridge(struct pci_bus *bus, struct pci_dev * dev, int max, int pass)
> > {
> > ...
> > /* Disable MasterAbortMode during probing to avoid reporting
> > of bus errors (in some architectures) */
> > pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &bctl);
> > pci_write_config_word(dev, PCI_BRIDGE_CONTROL,
> > bctl & ~PCI_BRIDGE_CTL_MASTER_ABORT);
> > ...
> > if ((buses & 0xffff00) && !pcibios_assign_all_busses() && !is_cardbus) {
> > unsigned int cmax, busnr;
> > /*
> > * Bus already configured by firmware, process it in the first
> > * pass and just note the configuration.
> > */
> > if (pass)
> > return max;
> > ...
> > }
> >
> > pci_write_config_word(dev, PCI_BRIDGE_CONTROL, bctl);
> > ...
> >
> > This doesn't seem intentional.

Agreed, looks like an accident. The patch [1] originally came from Kip
Walker (Broadcom back then) between 2.6.0-test3 and 2.6.0-test4. As I
recall it was supposed to fix an issue with with PCI aborts being
signalled by the PCI bridge of the Broadcom BCM1250 family of SOCs when
probing behind pci_scan_bridge. It is undeseriable to disable
PCI_BRIDGE_CTL_MASTER_ABORT in pci_{read,write)_config_* and the
behaviour wasn't considered a bug in need of a workaround, so this was
put in probe.c.

I don't have an affected system at hand, so can't really test but I
propose something like the below patch.

[1] http://www.linux-mips.org/git?p=linux.git;a=commit;h=599457e0cb702a31a3247ea6a5d9c6c99c4cf195

[PCI] Avoid leaving MASTER_ABORT disabled permanently when returning from pci_scan_bridge.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e3ac86d8 17-Jan-2006 Kristen Accardi <kristen.c.accardi@intel.com>

[PATCH] PCI: really fix parent's subordinate busnr

After you find the maximum value of the subordinate buses below the child
bus, you must fix the parent's subordinate bus number again, otherwise
it may be too small.

Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# ac7dc65a 13-Dec-2005 Benjamin Herrenschmidt <benh@kernel.crashing.org>

[PATCH] PCI: Export pci_cfg_space_size

The powerpc PCI code sets up the PCI tree without doing config space
accesses in most cases, from the firmware tree. However, it still wants
to call pci_cfg_space_size() under some conditions, thus it needs to
be made non-static (though I don't see a point to export it to modules).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 49887941 08-Dec-2005 Dominik Brodowski <linux@dominikbrodowski.net>

[PATCH] PCI: use bus numbers sparsely, if necessary

Add a warning if a child bus may be inaccessible because the
parent bridge has wrong secondary or subordinate bus numbers.
Note that this may or may not happen on "transparent" bridges,
as can be seen in bug #5557.

Also, if we do not fix up the assignment of bus numbers, try to
make use of the bus number space available.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 9d265124 05-Dec-2005 Daniel Yeisley <dan.yeisley@unisys.com>

[PATCH] PCI Quirk: 1K I/O space granularity on Intel P64H2

I've implemented a quirk to take advantage of the 1KB I/O space
granularity option on the Intel P64H2 PCI Bridge. I had to change
probe.c because it sets the resource start and end to be aligned on 4k
boundaries (after the quirk sets them to 1k boundaries). I've tested
this patch on a Unisys ES7000-600 both with and without the 1KB option
enabled. I also tested this on a 2 processor Dell box that doesn't have
a P64H2 to make sure there were no negative affects there.

Signed-off-by: Dan Yeisley <dan.yeisley@unisys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 3efd273b 02-Nov-2005 Kristen Accardi <kristen.c.accardi@intel.com>

[PATCH] pci: call pci_read_irq for bridges

Call pci_read_irq() for bridges too, so that the pin value
is stored for bridges that require interrupts.

Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# ffeff788 02-Nov-2005 Kristen Accardi <kristen.c.accardi@intel.com>

[PATCH] pci: store PCI_INTERRUPT_PIN in pci_dev

Store the value of the INTERRUPT_PIN in the pci_dev structure
so that it can be retrieved later.

Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 8f7020d3 23-Oct-2005 Randy Dunlap <rdunlap@infradead.org>

[PATCH] kernel-doc: PCI fixes

PCI: add descriptions for missing function parameters.
Eliminate all kernel-doc warnings here.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 12f44f46 22-Sep-2005 Ivan Kokshaysky <ink@jurassic.park.msu.ru>

[PATCH] pci: fixup parent subordinate busnr

I believe the change that broke things is introduction of
pci_fixup_parent_subordinate_busnr().

The patch here does two things:
- hunk #1 should fix the problems you've seen when you boot without
additional "pci" kernel options;
- hunk #2 supposedly fixes boot with "pci=assign-busses" option which
otherwise hangs Acer TM81xx machines as reported.

Please try this with and without "pci=assign-busses". If it boots,
I'd like to see 'lspci -vvx' for both cases.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 3c6de929 22-Sep-2005 Amos Waterland <apw@us.ibm.com>

[PATCH] fix drivers/pci/probe.c warning

This function expects an unsigned 32-bit type as its third argument:

static u32 pci_size(u32 base, u32 maxbase, u32 mask)

However, given these definitions:

#define PCI_BASE_ADDRESS_MEM_MASK (~0x0fUL)
#define PCI_ROM_ADDRESS_MASK (~0x7ffUL)

these two calls in drivers/pci/probe.c are problematic for architectures
for which a UL is not equivalent to a u32:

sz = pci_size(l, sz, PCI_BASE_ADDRESS_MEM_MASK);
sz = pci_size(l, sz, PCI_ROM_ADDRESS_MASK);

Hence the below compile warning when building for ARCH=ppc64:

drivers/pci/probe.c: In function `pci_read_bases':
/.../probe.c:168: warning: large integer implicitly truncated to unsigned type
/.../probe.c:218: warning: large integer implicitly truncated to unsigned type

Here is a simple fix.

Signed-off-by: Amos Waterland <apw@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 4327edf6 10-Sep-2005 Alan Cox <alan@lxorguk.ukuu.org.uk>

[PATCH] Subject: PATCH: fix numa caused compile warnings

pcibus_to_cpumask expands into more than just an initialiser so gcc
moans about code before variable declarations.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# cdb9b9f7 05-Sep-2005 Paul Mackerras <paulus@samba.org>

[PATCH] PCI: Small rearrangement of PCI probing code

This patch makes some small rearrangements of the PCI probing code in
order to make it possible for arch code to set up the PCI tree
without needing to duplicate code from the PCI layer unnecessarily.
PPC64 will use this to set up the PCI tree from the Open Firmware
device tree, which we need to do on logically-partitioned pSeries
systems.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 3fe9d19f 17-Aug-2005 Daniel Ritz <daniel.ritz@gmx.ch>

[PATCH] PCI: Support PCM PM CAP version 3

- support PCI PM CAP version 3 (as defined in PCI PM Interface Spec v1.2)

- pci/probe.c sets the PM state initially to 4 which is D3cold. add a
PCI_UNKNOWN

- minor cleanups

Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 982245f0 16-Jul-2005 Adrian Bunk <bunk@stusta.de>

[PATCH] PCI: remove CONFIG_PCI_NAMES

This patch removes CONFIG_PCI_NAMES.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 10f4338c 29-Jul-2005 Ivan Kokshaysky <ink@jurassic.park.msu.ru>

[PATCH] PCI: remove PCI_BRIDGE_CTL_VGA handling from setup-bus.c

The setup-bus code doesn't work correctly for configurations
with more than one display adapter in the same PCI domain.
This stuff actually is a leftover of an early 2.4 PCI setup code
and apparently it stopped working after some "bridge_ctl" changes.
So the best thing we can do is just to remove it and rely on the fact
that any firmware *has* to configure VGA port forwarding for the boot
display device properly.

But then we need to ensure that the bus->bridge_ctl will always
contain valid information collected at the probe time, therefore
the following change in pci_scan_bridge() is needed.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# 90b54929 06-Jun-2005 Ivan Kokshaysky <ink@jurassic.park.msu.ru>

[PATCH] PCI: handle subtractive decode pci-pci bridge better

With the number of PCI bus resources increased to 8, we can
handle the subtractive decode PCI-PCI bridge like a normal
bridge, taking into account standard PCI-PCI bridge windows
(resources 0-2). This helps to avoid problems with peer-to-peer DMA
behind such bridges, poor performance for MMIO ranges outside bridge
windows and prefetchable vs. non-prefetchable memory issues.

To reflect the fact that such bridges do forward all addresses to
the secondary bus (transparency), remaining bus resources 3-7 are
linked to resources 0-4 of the primary bus. These resources will be
used as fallback by resource management code if allocation from
standard bridge windows fails for some reason.

Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 26f674ae 02-Jun-2005 Greg Kroah-Hartman <gregkh@suse.de>

[PATCH] PCI: Fix up PCI routing in parent bridge

When the cardbus bridge is behind another bridge change the routing
in the parent bridge for new cards. This fixes Cardbus on various AMD64
laptops.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 6ef6f0e3 28-Apr-2005 Rajesh Shah <rajesh.shah@intel.com>

[PATCH] acpi bridge hotadd: Link newly created pci child bus to its parent on creation

When a pci child bus is created, add it to the parent's children list
immediately rather than waiting till pci_bus_add_devices(). For hot-plug
bridges/devices, pci_bus_add_devices() may be called much later, after they
have been properly configured. In the meantime, this allows us to use the
normal pci bus search functions for the hot-plug bridges/buses.

Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# e4ea9bb7 28-Apr-2005 Rajesh Shah <rajesh.shah@intel.com>

[PATCH] acpi bridge hotadd: Take the PCI lock when modifying pci bus or device lists

With root bridge and pci bridge hot-plug, new buses and devices can be added
or removed at run time. Protect the pci bus and device lists with the pci
lock when doing so.

Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# cc57450f 28-Apr-2005 Rajesh Shah <rajesh.shah@intel.com>

[PATCH] acpi bridge hotadd: Prevent duplicate bus numbers when scanning PCI bridge

When hot-plugging a root bridge, as we try to assign bus numbers we may find
that the hotplugged hieratchy has more PCI to PCI bridges (i.e. bus
requirements) than available. Make sure we don't step over an existing bus
when that happens.

Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# c431ada4 28-Apr-2005 Rajesh Shah <rajesh.shah@intel.com>

[PATCH] acpi bridge hotadd: ACPI based root bridge hot-add

When you hot-plug a (root) bridge hierarchy, it may have p2p bridges and
devices attached to it that have not been configured by firmware. In this
case, we need to configure the devices before starting them. This patch
separates device start from device scan so that we can introduce the
configuration step in the middle.

I kept the existing semantics for pci_scan_bus() since there are a huge number
of callers to that function.

Also, I have no way of testing the changes I made to the parisc files, so this
needs review by those folks. Sorry for the massive cross-post, this touches
files in many different places.

Signed-off-by: Rajesh Shah <rajesh.shah@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# f797f9cc 13-Jun-2005 Olof Johansson <olof@lixom.net>

[PATCH] Fix PCI BAR size interpretation on 64-bit arches

On 64-bit machines, PCI_BASE_ADDRESS_MEM_MASK and other mask constants
passed to pci_size() are 64-bit (for example ~0x0fUL). However, pci_size
does comparisons between the u32 arguments and the mask, which will fail
even though any result from pci_size is still just 32-bit.

Changing the mask argument to u32 seems the obvious thing to do, since all
arithmetic in the function is 32-bit and having a larger mask makes no
sense.

This triggered on a PPC64 system here where an adapter (VGA, as it
happened) had a memory region base of 0xfe000000 and a sz of the same,
matching the if (max == maxbase ...) test at the bottom of pci_size but
failing the mask comparison. Quite a corner case which I guess explains
why we haven't seen it until now.

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>


# bc56b9e0 07-Apr-2005 Greg Kroah-Hartman <gregkh@suse.de>

[PATCH] PCI: Clean up a lot of sparse "Should it be static?" warnings.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


# 1da177e4 16-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org>

Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!