History log of /linux-master/drivers/pci/pcie/aer.c
Revision Date Author Comments
# 0a5a46a6 06-Feb-2024 Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

PCI/AER: Generalize TLP Header Log reading

Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs
7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after
AER as the struct aer_header_log_regs. Also, not all places that handle TLP
Header Log use the struct and the struct members are named individually.

Generalize the struct name and members, and use it consistently where TLP
Header Log is being handled so that a pcie_read_tlp_log() helper can be
easily added.

Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop ixgbe changes for now, tidy whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a37e12bc 06-Feb-2024 Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

PCI/AER: Use explicit register size for PCI_ERR_CAP

Use u32 for PCIe AER Capability register variable and name it "aercc"
(Advanced Error Capabilities and Control register, PCIe r6.1 sec 7.8.4.7)
instead of "temp".

Link: https://lore.kernel.org/r/20240206135717.8565-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: make subject more specific and match similar previous patches]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# db02e176 06-Dec-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Use explicit register sizes for struct members

aer_irq() reads the AER Root Error Status and Error Source Identification
(PCI_ERR_ROOT_STATUS and PCI_ERR_ROOT_ERR_SRC) registers directly into
struct aer_err_source. Both registers are 32 bits, so declare the members
explicitly as "u32" instead of "unsigned int".

Similarly, aer_get_device_error_info() reads the AER Header Log
(PCI_ERR_HEADER_LOG) registers, which are also 32 bits, into struct
aer_header_log_regs. Declare those members as "u32" as well.

No functional changes intended.

Link: https://lore.kernel.org/r/20231206224231.732765-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


# 1291b716 06-Dec-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Decode Requester ID when no error info found

When a device with AER detects an error, it logs error information in its
own AER Error Status registers. It may send an Error Message to the Root
Port (RCEC in the case of an RCiEP), which logs the fact that an Error
Message was received (Root Error Status) and the Requester ID of the
message source (Error Source Identification).

aer_print_port_info() prints the Requester ID from the Root Port Error
Source in the usual Linux "bb:dd.f" format, but when find_source_device()
finds no error details in the hierarchy below the Root Port, it printed the
raw Requester ID without decoding it.

Decode the Requester ID in the usual Linux format so it matches other
messages.

Sample message changes:

- pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
- pcieport 0000:00:1c.5: AER: can't find device of ID00e5
+ pcieport 0000:00:1c.5: AER: Correctable error message received from 0000:00:1c.5
+ pcieport 0000:00:1c.5: AER: found no error details for 0000:00:1c.5

Link: https://lore.kernel.org/r/20231206224231.732765-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# 02a06f5f 06-Dec-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected". To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected. The recovery agent must report an uncorrectable error if it is
unable to recover. If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

- pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
+ pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5

Link: https://lore.kernel.org/r/20231206224231.732765-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# b7e9392d 18-Oct-2023 Robert Richter <rrichter@amd.com>

PCI/AER: Unmask RCEC internal errors to enable RCH downstream port error handling

AER corrected and uncorrectable internal errors (CIE/UIE) are masked
in their corresponding mask registers per default once in power-up
state. [1][2] Enable internal errors for RCECs to receive CXL
downstream port errors of Restricted CXL Hosts (RCHs).

[1] CXL 3.0 Spec, 12.2.1.1 - RCH Downstream Port Detected Errors
[2] PCIe Base Spec r6.0, 7.8.4.3 Uncorrectable Error Mask Register,
7.8.4.6 Correctable Error Mask Register

Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-19-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 0a867568 18-Oct-2023 Robert Richter <rrichter@amd.com>

PCI/AER: Forward RCH downstream port-detected errors to the CXL.mem dev handler

In Restricted CXL Device (RCD) mode a CXL device is exposed as an
RCiEP, but CXL downstream and upstream ports are not enumerated and
not visible in the PCIe hierarchy. [1] Protocol and link errors from
these non-enumerated ports are signaled as internal AER errors, either
Uncorrectable Internal Error (UIE) or Corrected Internal Errors (CIE)
via an RCEC.

Restricted CXL host (RCH) downstream port-detected errors have the
Requester ID of the RCEC set in the RCEC's AER Error Source ID
register. A CXL handler must then inspect the error status in various
CXL registers residing in the dport's component register space (CXL
RAS capability) or the dport's RCRB (PCIe AER extended
capability). [2]

Errors showing up in the RCEC's error handler must be handled and
connected to the CXL subsystem. Implement this by forwarding the error
to all CXL devices below the RCEC. Since the entire CXL device is
controlled only using PCIe Configuration Space of device 0, function
0, only pass it there [3]. The error handling is limited to currently
supported devices with the Memory Device class code set (CXL Type 3
Device, PCI_CLASS_MEMORY_CXL, 502h), handle downstream port errors in
the device's cxl_pci driver. Support for other CXL Device Types
(e.g. a CXL.cache Device) can be added later.

To handle downstream port errors in addition to errors directed to the
CXL endpoint device, a handler must also inspect the CXL RAS and PCIe
AER capabilities of the CXL downstream port the device is connected
to.

Since CXL downstream port errors are signaled using internal errors,
the handler requires those errors to be unmasked. This is subject of a
follow-on patch.

The reason for choosing this implementation is that the AER service
driver claims the RCEC device, but does not allow it to register a
custom specific handler to support CXL. Connecting the RCEC hard-wired
with a CXL handler does not work, as the CXL subsystem might not be
present all the time. The alternative to add an implementation to the
portdrv to allow the registration of a custom RCEC error handler isn't
worth doing it as CXL would be its only user. Instead, just check for
an CXL RCEC and pass it down to the connected CXL device's error
handler. With this approach the code can entirely be implemented in
the PCIe AER driver and is independent of the CXL subsystem. The CXL
driver only provides the handler.

[1] CXL 3.0 spec: 9.11.8 CXL Devices Attached to an RCH
[2] CXL 3.0 spec, 12.2.1.1 RCH Downstream Port-detected Errors
[3] CXL 3.0 spec, 8.1.3 PCIe DVSEC for CXL Devices

Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-pci@vger.kernel.org
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-18-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 6777877e 18-Oct-2023 Terry Bowman <terry.bowman@amd.com>

PCI/AER: Refactor cper_print_aer() for use by CXL driver module

The CXL driver plans to use cper_print_aer() for logging restricted CXL
host (RCH) AER errors. cper_print_aer() is not currently exported and
therefore not usable by the CXL drivers built as loadable modules. Export
the cper_print_aer() function. Use the EXPORT_SYMBOL_NS_GPL() variant
to restrict the export to CXL drivers.

The CONFIG_ACPI_APEI_PCIEAER kernel config is currently used to enable
cper_print_aer(). cper_print_aer() logs the AER registers and is
useful in PCIE AER logging outside of APEI. Remove the
CONFIG_ACPI_APEI_PCIEAER dependency to enable cper_print_aer().

The cper_print_aer() function name implies CPER specific use but is useful
in non-CPER cases as well. Rename cper_print_aer() to pci_print_aer().

Also, update cxl_core to import CXL namespace imports.

Co-developed-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231018171713.1883517-13-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 13cf36c6 11-May-2023 Kai-Heng Feng <kai.heng.feng@canonical.com>

PCI/AER: Factor out interrupt toggling into helpers

There are many places that enable and disable AER interrupt, so move
them into helpers.

Link: https://lore.kernel.org/r/20230512000014.118942-1-kai.heng.feng@canonical.com
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


# e2abc47a 20-Sep-2023 Shiju Jose <shiju.jose@huawei.com>

ACPI: APEI: Fix AER info corruption when error status data has multiple sections

ghes_handle_aer() passes AER data to the PCI core for logging and
recovery by calling aer_recover_queue() with a pointer to struct
aer_capability_regs.

The problem was that aer_recover_queue() queues the pointer directly
without copying the aer_capability_regs data. The pointer was to
the ghes->estatus buffer, which could be reused before
aer_recover_work_func() reads the data.

To avoid this problem, allocate a new aer_capability_regs structure
from the ghes_estatus_pool, copy the AER data from the ghes->estatus
buffer into it, pass a pointer to the new struct to
aer_recover_queue(), and free it after aer_recover_work_func() has
processed it.

Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 49f77672 23-Aug-2023 Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

PCI/AER: Export pcie_aer_is_native()

Export and move the declaration of pcie_aer_is_native() to a common header
file to be reused by cxl/pci module.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230823234305.27333-3-Smita.KoralahalliChannabasappa@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 95881c86 27-May-2020 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Simplify AER_RECOVER_RING_SIZE definition

ACPI Platform Error Interfaces (APEI) convey error information to the OS.
If the APEI GHES driver receives information about PCI errors, it queues it
in aer_recover_ring for processing by the PCI AER code.

AER_RECOVER_RING_SIZE is the size of the aer_recover_ring FIFO and is
arbitrary, with no direct connection to hardware.

AER_RECOVER_RING_ORDER was only used to compute AER_RECOVER_RING_SIZE.
Remove it and define AER_RECOVER_RING_SIZE directly. No functional change
intended.

Link: https://lore.kernel.org/r/20230824193712.542167-7-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>


# f7f7c3d6 07-Aug-2023 Xiongfeng Wang <wangxiongfeng2@huawei.com>

PCI/AER: Use pci_dev_id() to simplify the code

When we have a struct pci_dev *, use pci_dev_id() instead of manually
composing the ID with PCI_DEVID() from dev->bus->number and dev->devfn.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/20230807134858.116051-3-wangxiongfeng2@huawei.com
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7ec4b34b 10-Jul-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Unexport pci_enable_pcie_error_reporting()

pci_enable_pcie_error_reporting() is used only inside aer.c. Stop exposing
it outside the file.

Link: https://lore.kernel.org/r/20230710232136.233034-3-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# 69b264df 10-Jul-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Drop unused pci_disable_pcie_error_reporting()

pci_disable_pcie_error_reporting() has no callers. Remove it.

Link: https://lore.kernel.org/r/20230710232136.233034-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# 6b985af5 18-Jan-2023 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Remove redundant Device Control Error Reporting Enable

The following bits in the PCIe Device Control register enable sending of
ERR_COR, ERR_NONFATAL, or ERR_FATAL Messages (or reporting internally in
the case of Root Ports):

Correctable Error Reporting Enable
Non-Fatal Error Reporting Enable
Fatal Error Reporting Enable
Unsupported Request Reporting Enable

These enable bits are set by pci_enable_pcie_error_reporting(), and since
f26e58bf6f54 ("PCI/AER: Enable error reporting when AER is native"), we
do that in this path during enumeration:

pci_init_capabilities
pci_aer_init
pci_enable_pcie_error_reporting

Previously, the AER service driver also traversed the hierarchy when
claiming a Root Port, enabling error reporting for downstream devices, but
this is redundant.

Remove the code that enables this error reporting in the AER .probe() path.
Also remove similar code that disables error reporting in the AER .remove()
path.

Note that these Device Control Reporting Enable bits do not control
interrupt generation. That's done by the similarly-named bits in the AER
Root Error Command register, which are still set by aer_probe() and cleared
by aer_remove(), since the AER service driver handles those interrupts.
See PCIe r6.0, sec 6.2.6.

Link: https://lore.kernel.org/r/20230118234612.272916-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Stefan Roese <sr@denx.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <kbusch@kernel.org>


# bba50659 11-Jan-2023 Vidya Sagar <vidyas@nvidia.com>

PCI/AER: Configure ECRC only if AER is native

As the ECRC configuration bits are part of AER registers, configure ECRC
only if AER is natively owned by the kernel.

Link: https://lore.kernel.org/r/20230112072111.20063-1-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 361187e0 30-Nov-2022 Dave Jiang <dave.jiang@intel.com>

PCI/AER: Add optional logging callback for correctable error

Some new devices such as CXL devices may want to record additional error
information on a corrected error. Add a callback to allow the PCI device
driver to do additional logging such as providing additional stats for user
space RAS monitoring.

For CXL device, this is actually a need due to CXL needing to write to the
CXL RAS capability structure correctable error status register in order to
clear the unmasked correctable errors. See CXL spec rev3.0 8.2.4.16.

Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166984619233.2804404.3966368388544312674.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 5e6ae050 09-May-2022 Mohamed Khalfella <mkhalfella@purestorage.com>

PCI/AER: Iterate over error counters instead of error strings

Previously we iterated over AER stat *names*, e.g.,
aer_correctable_error_string[32], but the actual stat *counters* may not be
that large, e.g., pdev->aer_stats->dev_cor_errs[16], which means that we
printed junk in the sysfs stats files.

Iterate over the stat counter arrays instead of the names to avoid this
junk.

Also, added a build time check to make sure all
counters have entries in strings array.

Fixes: 0678e3109a3c ("PCI/AER: Simplify __aer_print_error()")
Link: https://lore.kernel.org/r/20220509181441.31884-1-mkhalfella@purestorage.com
Reported-by: Meeta Saggi <msaggi@purestorage.com>
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Meeta Saggi <msaggi@purestorage.com>
Reviewed-by: Eric Badger <ebadger@purestorage.com>
Cc: stable@vger.kernel.org


# f26e58bf 25-Jan-2022 Stefan Roese <sr@denx.de>

PCI/AER: Enable error reporting when AER is native

If we have native control of AER, set the following error reporting enable
bits:

- Correctable Error Reporting Enable
- Non-Fatal Error Reporting Enable
- Fatal Error Reporting Enable
- Unsupported Request Reporting Enable

Note that these bits are all in the Device Control register and are not
AER-specific.

This affects all devices with an AER capability, including hot-added
devices.

Please note that this change is quite invasive, as error reporting now will
be enabled for all available PCIe Endpoints, which was previously not the
case.

When "pci=noaer" is selected, error reporting stays disabled of course.

[bhelgaas: commit log, note error reporting is not AER-specific]
Link: https://lore.kernel.org/r/20220125071820.2247260-4-sr@denx.de
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yao Hongbo <yaohongbo@linux.alibaba.com>
Cc: Naveen Naidu <naveennaidu479@gmail.com>


# 9ffb98f1 25-Jan-2022 Stefan Roese <sr@denx.de>

PCI/AER: Configure ECRC for every device

Move pcie_set_ecrc_checking() to pci_aer_init() to make sure that
pcie_set_ecrc_checking() is called for each PCIe device, including
hot-added devices.

Link: https://lore.kernel.org/r/20220125071820.2247260-2-sr@denx.de
Signed-off-by: Stefan Roese <sr@denx.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Pali Rohár <pali@kernel.org>
Cc: Bharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Yao Hongbo <yaohongbo@linux.alibaba.com>
Cc: Naveen Naidu <naveennaidu479@gmail.com>


# 203926da 18-Apr-2022 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/AER: Clear MULTI_ERR_COR/UNCOR_RCV bits

When a Root Port or Root Complex Event Collector receives an error Message
e.g., ERR_COR, it sets PCI_ERR_ROOT_COR_RCV in the Root Error Status
register and logs the Requester ID in the Error Source Identification
register. If it receives a second ERR_COR Message before software clears
PCI_ERR_ROOT_COR_RCV, hardware sets PCI_ERR_ROOT_MULTI_COR_RCV and the
Requester ID is lost.

In the following scenario, PCI_ERR_ROOT_MULTI_COR_RCV was never cleared:

- hardware receives ERR_COR message
- hardware sets PCI_ERR_ROOT_COR_RCV
- aer_irq() entered
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_COR_RCV
- hardware receives second ERR_COR message
- hardware sets PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq(): pci_write_config_dword(PCI_ERR_ROOT_STATUS, status)
- PCI_ERR_ROOT_COR_RCV is cleared; PCI_ERR_ROOT_MULTI_COR_RCV is set
- aer_irq() entered again
- aer_irq(): status = pci_read_config_dword(PCI_ERR_ROOT_STATUS)
- aer_irq(): now status == PCI_ERR_ROOT_MULTI_COR_RCV
- aer_irq() exits because PCI_ERR_ROOT_COR_RCV not set
- PCI_ERR_ROOT_MULTI_COR_RCV is still set

The same problem occurred with ERR_NONFATAL/ERR_FATAL Messages and
PCI_ERR_ROOT_UNCOR_RCV and PCI_ERR_ROOT_MULTI_UNCOR_RCV.

Fix the problem by queueing an AER event and clearing the Root Error Status
bits when any of these bits are set:

PCI_ERR_ROOT_COR_RCV
PCI_ERR_ROOT_UNCOR_RCV
PCI_ERR_ROOT_MULTI_COR_RCV
PCI_ERR_ROOT_MULTI_UNCOR_RCV

See the bugzilla link for details from Eric about how to reproduce this
problem.

[bhelgaas: commit log, move repro details to bugzilla]
Fixes: e167bfcaa4cd ("PCI: aerdrv: remove magical ROOT_ERR_STATUS_MASKS")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215992
Link: https://lore.kernel.org/r/20220418150237.1021519-1-sathyanarayanan.kuppuswamy@linux.intel.com
Reported-by: Eric Badger <ebadger@purestorage.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>


# b2105b9f 06-Oct-2021 Krzysztof Wilczyński <kw@linux.com>

PCI: Correct misspelled and remove duplicated words

Correct a number of misspelled words and remove any words that were
duplicated in the PCI tree. No change to functionality intended.

Link: https://lore.kernel.org/r/20211006233827.147328-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 9bdc81ce 17-Aug-2021 Amey Narkhede <ameynarkhede03@gmail.com>

PCI: Change the type of probe argument in reset functions

Change the type of probe argument in functions which implement reset
methods from int to bool to make the context and intent clear.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Link: https://lore.kernel.org/r/20210817180500.1253-10-ameynarkhede03@gmail.com
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 56f107d7 17-Aug-2021 Amey Narkhede <ameynarkhede03@gmail.com>

PCI: Add pcie_reset_flr() with 'probe' argument

Most reset methods are of the form "pci_*_reset(dev, probe)". pcie_flr()
was an exception because it relied on a separate pcie_has_flr() function
instead of taking a "probe" argument.

Add "pcie_reset_flr(dev, probe)" to follow the convention. Remove
pcie_has_flr().

Some pcie_flr() callers that did not use pcie_has_flr() remain.

[bhelgaas: commit log, rework pcie_reset_flr() to use dev->devcap directly]
Link: https://lore.kernel.org/r/20210817180500.1253-3-ameynarkhede03@gmail.com
Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>


# f8cf6e51 02-Jun-2021 Krzysztof Wilczyński <kw@linux.com>

PCI/sysfs: Use sysfs_emit() and sysfs_emit_at() in "show" functions

The sysfs_emit() and sysfs_emit_at() functions were introduced to make
it less ambiguous which function is preferred when writing to the output
buffer in a device attribute's "show" callback [1].

Convert the PCI sysfs object "show" functions from sprintf(), snprintf()
and scnprintf() to sysfs_emit() and sysfs_emit_at() accordingly, as the
latter is aware of the PAGE_SIZE buffer and correctly returns the number
of bytes written into the buffer.

No functional change intended.

[1] Documentation/filesystems/sysfs.rst

Related commit: ad025f8e46f3 ("PCI/sysfs: Use sysfs_emit() and
sysfs_emit_at() in "show" functions").

Link: https://lore.kernel.org/r/20210603000112.703037-2-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


# 95ea9539 29-Mar-2021 Yicong Yang <yangyicong@hisilicon.com>

PCI/AER: Use consistent format when printing PCI device

We use format domain:bus:slot.function when printing PCI device. Use
consistent format in AER messages.

[bhelgaas: also drop "AER recover:" prefix since we already have an "AER:"
prefix from pr_fmt()]
Link: https://lore.kernel.org/r/1617015721-51701-1-git-send-email-yangyicong@hisilicon.com
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kw@linux.com>


# 43395d9e 10-Mar-2021 Krzysztof Wilczyński <kw@linux.com>

PCI: Fix kernel-doc errors

Fix kernel-doc formatting errors, function names that don't match the doc,
and some missing parameter documentation. These are reported by:

make W=1 drivers/pci/

No functional change intended.

[bhelgaas: squashed into one patch since this only changes comments]
Link: https://lore.kernel.org/r/20210311001724.423356-1-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-2-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-3-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-4-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-5-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-6-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-7-kw@linux.com
Link: https://lore.kernel.org/r/20210311001724.423356-8-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 33ac78bd 04-Jan-2021 Keith Busch <kbusch@kernel.org>

PCI/AER: Specify the type of Port that was reset

The AER driver may be called upon to reset either a Downstream or a Root
Port. Check which type it is to properly identify it when logging that
the reset occurred.

Link: https://lore.kernel.org/r/20210104230300.1277180-5-kbusch@kernel.org
Tested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Hedi Berriche <hedi.berriche@hpe.com>


# 7a8a22be 04-Jan-2021 Keith Busch <kbusch@kernel.org>

PCI/AER: Clear AER status from Root Port when resetting Downstream Port

The pci_dev parameter given to aer_root_reset() may be a Downstream Port
rather than the Root Port. Get the Root Port from the provided device in
order to clear the root's AER status.

Link: https://lore.kernel.org/r/20210104230300.1277180-3-kbusch@kernel.org
Tested-by: Hedi Berriche <hedi.berriche@hpe.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Sean V Kelley <sean.v.kelley@intel.com>
Acked-by: Hedi Berriche <hedi.berriche@hpe.com>


# af113553 20-Nov-2020 Sean V Kelley <sean.v.kelley@intel.com>

PCI/AER: Add pcie_walk_rcec() to RCEC AER handling

Root Complex Event Collectors (RCEC) appear as peers to Root Ports and also
have the AER capability. In addition, actions need to be taken for
associated RCiEPs. In such cases the RCECs will need to be walked in order
to find and act upon their respective RCiEPs.

Extend the existing ability to link the RCECs with a walking function
pcie_walk_rcec(). Add RCEC support to the current AER service driver and
attach the AER service driver to the RCEC device.

Co-developed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lore.kernel.org/r/20201121001036.8560-14-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# 57908622 20-Nov-2020 Qiuxu Zhuo <qiuxu.zhuo@intel.com>

PCI/ERR: Recover from RCiEP AER errors

Add support for handling AER errors detected by Root Complex Integrated
Endpoints (RCiEPs). These errors are signaled to software natively via a
Root Complex Event Collector (RCEC) or non-natively via ACPI APEI if the
platform retains control of AER or uses a non-standard RCEC-like device.

When recovering from RCiEP errors, the Root Error Command and Status
registers are in the AER Capability of an associated RCEC (if any), not in
a Root Port. In the non-native case, the platform is responsible for those
registers and we can't touch them.

[bhelgaas: commit log, etc]
Co-developed-by: Sean V Kelley <sean.v.kelley@intel.com>
Link: https://lore.kernel.org/r/20201121001036.8560-13-sean.v.kelley@intel.com
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a175102b 02-Dec-2020 Sean V Kelley <sean.v.kelley@intel.com>

PCI/ERR: Recover from RCEC AER errors

A Root Complex Event Collector (RCEC) collects and signals AER errors that
were detected by Root Complex Integrated Endpoints (RCiEPs), but it may
also signal errors it detects itself. This is analogous to errors detected
and signaled by a Root Port.

Update the AER service driver to claim RCECs in addition to Root Ports.
Add support for handling RCEC-detected AER errors. This does not
include handling RCiEP-detected errors that are signaled by the RCEC.

Note that we expect these errors only from the native AER and APEI paths,
not from DPC or EDR.

[bhelgaas: split from combined RCEC/RCiEP patch, commit log]
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 480ef7cb 20-Nov-2020 Sean V Kelley <sean.v.kelley@intel.com>

PCI/ERR: Simplify by computing pci_pcie_type() once

Instead of calling pci_pcie_type(dev) twice, call it once and save the
result. No functional change intended.

Link: https://lore.kernel.org/r/20201121001036.8560-7-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


# 50cc18fc 20-Nov-2020 Sean V Kelley <sean.v.kelley@intel.com>

PCI/AER: Write AER Capability only when we control it

If an OS has not been granted AER control via _OSC, it should not make
changes to PCI_ERR_ROOT_COMMAND and PCI_ERR_ROOT_STATUS related registers.
Per section 4.5.1 of the System Firmware Intermediary (SFI) _OSC and DPC
Updates ECN [1], this bit also covers these aspects of the PCI Express
Advanced Error Reporting. Based on the above and earlier discussion [2],
make the following changes:

Add a check for the native case (i.e., AER control via _OSC)

Note that the previous "clear, reset, enable" order suggests that the reset
might cause errors that we should ignore. After this commit, those errors
(if any) will remain logged in the PCI_ERR_ROOT_STATUS register.

[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
2020, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/14076
[2] https://lore.kernel.org/linux-pci/20201020162820.GA370938@bjorn-Precision-5520/

Link: https://lore.kernel.org/r/20201121001036.8560-2-sean.v.kelley@intel.com
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> # non-native/no RCEC
Signed-off-by: Sean V Kelley <sean.v.kelley@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 068c29a2 22-Jun-2020 Jonathan Cameron <Jonathan.Cameron@huawei.com>

PCI/ERR: Clear PCIe Device Status errors only if OS owns AER

pcie_clear_device_status() resets the error bits in the PCIe Device Status
Register (PCI_EXP_DEVSTA).

Previously we did this unconditionally, but on ACPI systems, the _OSC AER
bit negotiates control of the AER capability. Per sec 4.5.1 of the System
Firmware Intermediary _OSC and DPC Updates ECN [1], this bit also covers
other error enable/status bits including the following:

Correctable Error Reporting Enable
Non-Fatal Error Reporting Enable
Fatal Error Reporting Enable
Unsupported Request Reporting Enable

These bits are all in the PCIe Device Control register (the ECN omitted
"Reporting", but I think that's a typo), so by implication the _OSC AER bit
also applies to the error status bits in the PCIe Device Status register:

Correctable Error Detected
Non-Fatal Error Detected
Fatal Error Detected
Unsupported Request Detected

Clear the PCIe Device Status error bits only when the OS controls the AER
capability and related error enable/status bits. If platform firmware
controls the AER capability, firmware is responsible for clearing these
bits.

One call path leading here is:

ghes_do_proc
ghes_handle_aer
aer_recover_queue
schedule_work(&aer_recover_work)
...
aer_recover_work_func
pcie_do_recovery
pcie_clear_device_status

[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
2020, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/14076
[bhelgaas: commit log, move test from pcie_clear_device_status() to callers]
Link: https://lore.kernel.org/r/20200622113523.891666-1-Jonathan.Cameron@huawei.com
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 600a5b4f 16-Jul-2020 Bjorn Helgaas <bhelgaas@google.com>

PCI/ERR: Rename pci_aer_clear_device_status() to pcie_clear_device_status()

pci_aer_clear_device_status() clears the error bits in the PCIe Device
Status Register (PCI_EXP_DEVSTA). Every PCIe device has this register,
regardless of whether it supports AER.

Rename pci_aer_clear_device_status() to pcie_clear_device_status() to make
clear that it is PCIe-specific but not AER-specific. Move it to
drivers/pci/pci.c, again since it's not AER-specific. No functional change
intended.

Link: https://lore.kernel.org/r/20200717195619.766662-1-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3f649ab7 03-Jun-2020 Kees Cook <keescook@chromium.org>

treewide: Remove uninitialized_var() usage

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
xargs perl -pi -e \
's/\buninitialized_var\(([^\)]+)\)/\1/g;
s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>


# e83e2ca3 18-Jun-2020 Matt Jolly <Kangie@footclan.ninja>

PCI/AER: Log correctable errors as warning, not error

PCIe correctable errors are recovered by hardware with no need for software
intervention (PCIe r5.0, sec 6.2.2.1).

Reduce the log level of correctable errors from KERN_ERR to KERN_WARNING.

The bug reports below are for correctable error logging. This doesn't fix
the cause of those reports, but it may make the messages less alarming.

[bhelgaas: commit log, use pci_printk() to avoid code duplication]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201517
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196183
Link: https://lore.kernel.org/r/20200618155511.16009-1-Kangie@footclan.ninja
Signed-off-by: Matt Jolly <Kangie@footclan.ninja>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 0678e310 07-Jul-2020 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Simplify __aer_print_error()

aer_correctable_error_string[] and aer_uncorrectable_error_string[] have
descriptions of AER error status bits. Add NULL entries to these tables so
all entries for bits 0-31 are defined. Then we don't have to check for
ARRAY_SIZE() when decoding a status word, which simplifies
__aer_print_error().

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d20df83b 15-Jun-2020 Bolarinwa Olayemi Saheed <refactormyself@gmail.com>

PCI: Convert PCIe capability PCIBIOS errors to errno

The PCI config accessors (pci_read_config_word(), et al) return
PCIBIOS_SUCCESSFUL (zero) or positive error values like
PCIBIOS_FUNC_NOT_SUPPORTED.

The PCIe capability accessors (pcie_capability_read_word(), et al)
similarly return PCIBIOS errors, but some callers assume they return
generic errno values like -EINVAL.

For example, the Myri-10G probe function returns a positive PCIBIOS error
if the pcie_capability_clear_and_set_word() in pcie_set_readrq() fails:

myri10ge_probe
status = pcie_set_readrq
return pcie_capability_clear_and_set_word
if (status)
return status

A positive return from a PCI driver probe function would cause a "Driver
probe function unexpectedly returned" warning from local_pci_probe()
instead of the desired probe failure.

Convert PCIBIOS errors to generic errno for all callers of:

pcie_capability_read_word
pcie_capability_read_dword
pcie_capability_write_word
pcie_capability_write_dword
pcie_capability_set_word
pcie_capability_set_dword
pcie_capability_clear_word
pcie_capability_clear_dword
pcie_capability_clear_and_set_word
pcie_capability_clear_and_set_dword

that check the return code for anything other than zero.

[bhelgaas: commit log, squash together]
Suggested-by: Bjorn Helgaas <bjorn@helgaas.com>
Link: https://lore.kernel.org/r/20200615073225.24061-1-refactormyself@gmail.com
Signed-off-by: Bolarinwa Olayemi Saheed <refactormyself@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 07b2fbb5 29-May-2020 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Use "aer" variable for capability offset

Previously we used "pos" or "aer_pos" for the offset of the AER Capability.
Use "aer" consistently and initialize it the same way everywhere. No
functional change intended.

Link: https://lore.kernel.org/r/20200529230915.GA479883@bjorn-Precision-5520
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# af10cce7 26-May-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/AER: Remove redundant dev->aer_cap checks

pcie_aer_get_firmware_first() checks dev->aer_cap, so we can remove
redundant dev->aer_cap checks in the callers.

Link: https://lore.kernel.org/r/d5ccc7a060ec9cdc234bdae7df8a0a4410f13f42.1590534843.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 123f985a 26-May-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/AER: Remove redundant pci_is_pcie() checks

AER is a PCIe Extended Capability, so dev->aer_cap will only be set for
PCIe devices. Remove redundant pci_is_pcie() checks.

Link: https://lore.kernel.org/r/361c622eabe5b845b8092e0bec04a3a2c262cb38.1590534843.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 708b2000 26-May-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/AER: Remove HEST/FIRMWARE_FIRST parsing for AER ownership

Commit c100beb9ccfb ("PCI/AER: Use only _OSC to determine AER ownership")
removed the use of HEST in determining AER ownership, but the AER driver
still used HEST to verify AER ownership in some of its APIs.

Per the ACPI spec v6.3, sec 18.3.2.4, some HEST table entries contain a
FIRMWARE_FIRST bit, but that bit does not tell us anything about ownership
of the AER capability.

Remove parsing of HEST to look for FIRMWARE_FIRST.

Add pcie_aer_is_native() for the places that need to know whether the OS
owns the AER capability.

[bhelgaas: commit log, reorder patch, remove unused __aer_firmware_first]
Link: https://lore.kernel.org/r/9a37f53a4e6ff4942ff8e18dbb20b00e16c47341.1590534843.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# c100beb9 27-Apr-2020 Alexandru Gagniuc <mr.nuke.me@gmail.com>

PCI/AER: Use only _OSC to determine AER ownership

Per the PCI Firmware spec, r3.2, sec 4.5.1, the OS can request control of
AER via bit 3 of the _OSC Control Field. In the returned value of the
Control Field:

The firmware sets [bit 3] to 1 to grant control over PCI Express Advanced
Error Reporting. ... after control is transferred to the operating
system, firmware must not modify the Advanced Error Reporting Capability.
If control of this feature was requested and denied or was not requested,
firmware returns this bit set to 0.

Previously the pci_root driver looked at the HEST FIRMWARE_FIRST bit to
determine whether to request ownership of the AER Capability. This was
based on ACPI spec v6.3, sec 18.3.2.4, and similar sections, which say
things like:

Bit [0] - FIRMWARE_FIRST: If set, indicates that system firmware will
handle errors from this source first.

Bit [1] - GLOBAL: If set, indicates that the settings contained in this
structure apply globally to all PCI Express Devices.

These ACPI references don't say anything about ownership of the AER
Capability.

Remove use of the FIRMWARE_FIRST bit and rely only on the _OSC bit to
determine whether we have control of the AER Capability.

Link: https://lore.kernel.org/r/20181115231605.24352-1-mr.nuke.me@gmail.com/ v1
Link: https://lore.kernel.org/r/20190326172343.28946-1-mr.nuke.me@gmail.com/ v2
Link: https://lore.kernel.org/r/67af2931705bed9a588b5a39d369cb70b9942190.1587925636.git.sathyanarayanan.kuppuswamy@linux.intel.com
[bhelgaas: commit log, note: Alex posted this identical patch 18 months
ago, and I failed to apply it then, so I made him the author, added links
to his postings, and added his Signed-off-by]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>


# 894020fd 23-Mar-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/AER: Rationalize error status register clearing

The AER interfaces to clear error status registers were a confusing mess:

- pci_cleanup_aer_uncorrect_error_status() cleared non-fatal errors
from the Uncorrectable Error Status register.

- pci_aer_clear_fatal_status() cleared fatal errors from the
Uncorrectable Error Status register.

- pci_cleanup_aer_error_status_regs() cleared the Root Error Status
register (for Root Ports), the Uncorrectable Error Status register,
and the Correctable Error Status register.

Rename them to make them consistent:

From To
---------------------------------------- -------------------------------
pci_cleanup_aer_uncorrect_error_status() pci_aer_clear_nonfatal_status()
pci_aer_clear_fatal_status() pci_aer_clear_fatal_status()
pci_cleanup_aer_error_status_regs() pci_aer_clear_status()

Since pci_cleanup_aer_error_status_regs() (renamed to
pci_aer_clear_status()) is only used within drivers/pci/, move the
declaration from <linux/aer.h> to drivers/pci/pci.h.

[bhelgaas: commit log, add renames]
Link: https://lore.kernel.org/r/d1310a75dc3d28f7e8da4e99c45fbd3e60fe238e.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 20e15e67 23-Mar-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/AER: Add pci_aer_raw_clear_status() to unconditionally clear Error Status

Per the SFI _OSC and DPC Updates ECN [1] implementation note flowchart, the
OS seems to be expected to clear AER status even if it doesn't have
ownership of the AER capability. Unlike the DPC capability, where a DPC
ECN [2] specifies a window when the OS is allowed to access DPC registers
even if it doesn't have ownership, there is no clear model for AER.

Add pci_aer_raw_clear_status() to clear the AER error status registers
unconditionally. This is intended for use only by the EDR path (see [2]).

[1] System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24,
2020, affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/14076
[2] Downstream Port Containment Related Enhancements ECN, Jan 28, 2019,
affecting PCI Firmware Specification, Rev. 3.2
https://members.pcisig.com/wg/PCI-SIG/document/12888

[bhelgaas: changelog]
Link: https://lore.kernel.org/r/c19ad28f3633cce67448609e89a75635da0da07d.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# b6cf1a42 23-Mar-2020 Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

PCI/ERR: Remove service dependency in pcie_do_recovery()

Previously we passed the PCIe service type parameter to pcie_do_recovery(),
where reset_link() looked up the underlying pci_port_service_driver and its
.reset_link() function pointer. Instead of using this roundabout way, we
can just pass the driver-specific .reset_link() callback function when
calling pcie_do_recovery() function.

This allows us to call pcie_do_recovery() from code that is not a PCIe port
service driver, e.g., Error Disconnect Recover (EDR) support.

Remove pcie_port_find_service() and pcie_port_service_driver.reset_link
since they are now unused.

Link: https://lore.kernel.org/r/60e02b87b526cdf2930400059d98704bf0a147d1.1585000084.git.sathyanarayanan.kuppuswamy@linux.intel.com
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# d95f20c4 23-Jan-2020 Dongdong Liu <liudongdong3@huawei.com>

PCI/AER: Initialize aer_fifo

Previously we did not call INIT_KFIFO() for aer_fifo. This leads to
kfifo_put() sometimes returning 0 (queue full) when in fact it is not.

It is easy to reproduce the problem by using aer-inject:

$ aer-inject -s :82:00.0 multiple-corr-nonfatal

The content of the multiple-corr-nonfatal file is as below:

AER
COR RCVR
HL 0 1 2 3
AER
UNCOR POISON_TLP
HL 4 5 6 7

Fixes: 27c1ce8bbed7 ("PCI/AER: Use kfifo for tracking events instead of reimplementing it")
Link: https://lore.kernel.org/r/1579767991-103898-1-git-send-email-liudongdong3@huawei.com
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 161eea1b 27-Aug-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI/AER: Fix kernel-doc warnings

Kernel-doc validator complains:

aer.c:207: warning: Function parameter or member 'str' not described in 'pcie_ecrc_get_policy'
aer.c:1209: warning: Function parameter or member 'irq' not described in 'aer_isr'
aer.c:1209: warning: Function parameter or member 'context' not described in 'aer_isr'
aer.c:1209: warning: Excess function parameter 'work' description in 'aer_isr'

Fix the above accordingly.

Link: https://lore.kernel.org/r/20190827151823.75312-2-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# 6a8c9734 27-Aug-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI/AER: Use for_each_set_bit() to simplify code

Simplify error counting code by using for_each_set_bit() library function.

Link: https://lore.kernel.org/r/20190827151823.75312-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


# 6458b438 27-Aug-2019 Rajat Jain <rajatja@google.com>

PCI/AER: Add PoisonTLPBlocked to Uncorrectable error counters

The elements in the aer_uncorrectable_error_string[] refer to the bit names
in Uncorrectable Error Status Register. Add PoisonTLPBlocked, which was
added in PCIe r3.1, sec 7.10.2.

Link: https://lore.kernel.org/r/20190827222145.32642-1-rajatja@google.com
Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# af65d1ad 18-Oct-2019 Patel, Mayurkumar <mayurkumar.patel@intel.com>

PCI/AER: Save AER Capability for suspend/resume

Previously we did not save and restore the AER configuration on
suspend/resume, so the configuration may be lost after resume.

Save the AER configuration during suspend and restore it during resume.

[bhelgaas: commit log]
Link: https://lore.kernel.org/r/92EBB4272BF81E4089A7126EC1E7B28492C3B007@IRSMSX101.ger.corp.intel.com
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>


# 9cc6f75b 07-May-2019 Frederick Lawler <fred@fredlawl.com>

PCI/AER: Log messages with pci_dev, not pcie_device

Log messages with pci_dev, not pcie_device. Factor out common message
prefixes with dev_fmt().

Example output change:

- aer 0000:00:00.0:pci002: AER enabled with IRQ ...
+ pcieport 0000:00:00.0: AER: enabled with IRQ ...

Link: https://lore.kernel.org/lkml/20190509141456.223614-5-helgaas@kernel.org
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>


# d5579183 07-May-2019 Frederick Lawler <fred@fredlawl.com>

PCI/AER: Replace dev_printk(KERN_DEBUG) with dev_info()

Replace dev_printk(KERN_DEBUG) with dev_info() or dev_err() to be more
consistent with other logging.

These could be converted to dev_dbg(), but that depends on
CONFIG_DYNAMIC_DEBUG and DEBUG, and we want most of these messages to
*always* be in the dmesg log.

Also remove a redundant kzalloc() failure message.

Link: https://lore.kernel.org/lkml/20190509141456.223614-2-helgaas@kernel.org
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>


# 807ffb1e 28-Jan-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

PCI/AER: Use match_string() helper to simplify the code

match_string() returns the array index of a matching string. Use it
instead of the open-coded implementation.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 1063a514 14-Dec-2018 Yanjiang Jin <yanjiang.jin@hxt-semitech.com>

PCI/AER: Queue one GHES event, not several uninitialized ones

ecae65e133f2 ("PCI/AER: Use kfifo_in_spinlocked() to insert locked
elements") replaced kfifo_put() with kfifo_in_spinlocked(), but passed the
*size* of the queue entry, where kfifo_in_spinlocked() expects the *number*
of entries to be copied.

We want to insert only one element into kfifo, not "sizeof(entry) = 16".
Without this patch, we would get 15 uninitialized elements.

Fixes: ecae65e133f2 ("PCI/AER: Use kfifo_in_spinlocked() to insert locked elements")
Signed-off-by: Yanjiang Jin <yanjiang.jin@hxt-semitech.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>


# 390e2db8 11-Oct-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Abstract AER interrupt handling

The aer_inject module was directly calling aer_irq(). This required the
AER driver export its private IRQ handler for no other reason than to
support error injection. A driver should not have to expose its private
interfaces, so use the IRQ subsystem to route injection to the AER driver,
and make aer_irq() a private interface.

This provides additional benefits:

First, directly calling the IRQ handler bypassed the IRQ subsytem so the
injection wasn't really synthesizing what happens if a shared AER interrupt
occurs.

The error injection had to provide the callback data directly, which may be
racing with a removal that is freeing that structure. The IRQ subsystem
can handle that race.

Finally, using the IRQ subsystem automatically reacts to threaded IRQs,
keeping the error injection abstracted from that implementation detail.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 369fd7b0 18-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Use managed resource allocations

Use the managed device resource allocations for the service data so the AER
driver doesn't need to manage it, further simplifying this driver.

Link: https://lore.kernel.org/linux-pci/20180918235848.26694-12-keith.busch@intel.com
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 6200cc5e 18-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Use threaded IRQ for bottom half

The threaded IRQ is naturally single threaded as desired, so use that to
simplify the AER bottom half handler. Since the root port structure has
much less to do now, remove the rpc construction helper routine.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ecae65e1 18-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Use kfifo_in_spinlocked() to insert locked elements

Use the recommended kernel API for writing to a concurrently-accessed
kfifo. No functional change here.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 27c1ce8b 18-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Use kfifo for tracking events instead of reimplementing it

The kernel provides a generic FIFO implementation, so no need to reinvent
that capability in a driver. Replace the AER-specific implementation with
the kernel-provided kfifo. Since the interrupt handler producer and work
queue consumer run single threaded, there is no need for additional
locking, so remove that lock, too.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# fcd4d369 18-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Remove error source from AER struct aer_rpc

The AER struct aer_rpc was carrying a copy of the error source simply as a
temperary variable. Remove that from the structure and use a stack
variable for the purpose.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 3e41a317 18-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Remove unused aer_error_resume()

The error recovery callbacks are only run on child devices. A Root Port is
never a child device, so this error resume callback was never invoked.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# bdb5ac85 20-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/ERR: Handle fatal error recovery

We don't need to be paranoid about the topology changing while handling an
error. If the device has changed in a hotplug capable slot, we can rely on
the presence detection handling to react to a changing topology.

Restore the fatal error handling behavior that existed before merging DPC
with AER with 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and
re-enumeration of devices").

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# c4eed62a 20-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/ERR: Use slot reset if available

The secondary bus reset may have link side effects that a hotplug capable
port may incorrectly react to. Use the slot specific reset for hotplug
ports, fixing the undesirable link down-up handling during error
recovering.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: fold in
https://lore.kernel.org/linux-pci/20180926152326.14821-1-keith.busch@intel.com
for issue reported by Stephen Rothwell <sfr@canb.auug.org.au>]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# 9d938ea5 20-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Don't read upstream ports below fatal errors

The AER driver has never read the config space of an endpoint that reported
a fatal error because the link to that device is considered unreliable.

An ERR_FATAL from an upstream port almost certainly indicates an error on
its upstream link, so we can't expect to reliably read its config space for
the same reason.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# 60271ab0 20-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Take reference on error devices

Error handling may be running in parallel with a hot removal. Reference
count the device during AER handling so the device can not be freed while
AER wants to reference it.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# c29de841 20-Sep-2018 Keith Busch <kbusch@kernel.org>

PCI: portdrv: Initialize service drivers directly

The PCI port driver saves the PCI state after initializing the device with
the applicable service devices. This was, however, before the service
drivers were even registered because PCI probe happens before the
device_initcall initialized those service drivers. The config space state
that the services set up were not being saved. The end result would cause
PCI devices to not react to events that the drivers think they did if the
PCI state ever needed to be restored.

Fix this by changing the service drivers from using the init calls to
having the portdrv driver calling the services directly. This will get the
state saved as desired, while making the relationship between the port
driver and the services under it more explicit in the code.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>


# 45687f96 17-Jul-2018 Alexandru Gagniuc <mr.nuke.me@gmail.com>

PCI/AER: Don't clear AER bits if error handling is Firmware-First

If the platform requests Firmware-First error handling, firmware is
responsible for reading and clearing AER status bits. If OSPM also clears
them, we may miss errors. See ACPI v6.2, sec 18.3.2.5 and 18.4.

This race is mostly of theoretical significance, as it is not easy to
reasonably demonstrate it in testing.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: add similar guards to pci_cleanup_aer_uncorrect_error_status()
and pci_aer_clear_fatal_status()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 944d5859 31-Jul-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Remove duplicate PCI_EXP_AER_FLAGS definition

PCI_EXP_AER_FLAGS was defined twice (with identical definitions), once
under #ifdef CONFIG_ACPI_APEI, and again at the top level. This looks like
my merge error from these commits:

fd3362cb73de ("PCI/AER: Squash aerdrv_core.c into aerdrv.c")
41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")

Remove the duplicate PCI_EXP_AER_FLAGS definition.

Fixes: 41cbc9eb1a82 ("PCI/AER: Squash ecrc.c into aerdrv.c")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>


# 10d790d9 19-Jul-2018 Oza Pawandeep <poza@codeaurora.org>

PCI/AER: Clear device status bits during ERR_COR handling

In case of correctable error, the Correctable Error Detected bit in the
Device Status register is set. Clear it after handling the error.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# ec752f5d 19-Jul-2018 Oza Pawandeep <poza@codeaurora.org>

PCI/AER: Clear device status bits during ERR_FATAL and ERR_NONFATAL

Clear the device status bits while handling both ERR_FATAL and ERR_NONFATAL
cases.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: rename to pci_aer_clear_device_status(), declare internal to PCI
core instead of exposing it everywhere]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 5b6c0966 19-Jul-2018 Oza Pawandeep <poza@codeaurora.org>

PCI/AER: Factor out ERR_NONFATAL status bit clearing

aer_error_resume() clears all ERR_NONFATAL error status bits. This is
exactly what pci_cleanup_aer_uncorrect_error_status(), so use that instead
of duplicating the code.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# e7b0b847 19-Jul-2018 Oza Pawandeep <poza@codeaurora.org>

PCI/AER: Clear only ERR_NONFATAL bits during non-fatal recovery

pci_cleanup_aer_uncorrect_error_status() is called by driver .slot_reset()
methods when handling ERR_NONFATAL errors. Previously this cleared *all*
the bits, including ERR_FATAL bits.

Since we're only handling ERR_NONFATAL errors, clear only the ERR_NONFATAL
error status bits.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: split to separate patch]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7ab92e89 19-Jul-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Clear only ERR_FATAL status bits during fatal recovery

During recovery from fatal errors, we previously called
pci_cleanup_aer_uncorrect_error_status(), which cleared *all* uncorrectable
error status bits (both ERR_FATAL and ERR_NONFATAL).

Instead, call a new pci_aer_clear_fatal_status() that clears only the
ERR_FATAL bits (as indicated by the PCI_ERR_UNCOR_SEVER register).

Based-on-patch-by: Oza Pawandeep <poza@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 381634ca 19-Jul-2018 Sinan Kaya <okaya@codeaurora.org>

PCI: Hide pci_reset_bridge_secondary_bus() from drivers

Rename pci_reset_bridge_secondary_bus() to pci_bridge_secondary_bus_reset()
and move the declaration from linux/pci.h to drivers/pci.h to be used
internally in PCI directory only.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 18426238 19-Jul-2018 Sinan Kaya <okaya@codeaurora.org>

PCI: Handle error return from pci_reset_bridge_secondary_bus()

Commit 01fd61c0b9bd ("PCI: Add a return type for
pci_reset_bridge_secondary_bus()") added a return value to the function to
return if a device is accessible following a reset. Callers are not
checking the value.

Pass error code up high in the stack if device is not accessible.

Fixes: 01fd61c0b9bd ("PCI: Add a return type for pci_reset_bridge_secondary_bus()")
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 7af02fcd 03-Jul-2018 Alexandru Gagniuc <mr.nuke.me@gmail.com>

PCI/AER: Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST

According to the documentation, "pcie_ports=native", linux should use
native AER and DPC services. While that is true for the _OSC method
parsing, this is not the only place that is checked. Should the HEST
list PCIe ports as firmware-first, linux will not use native services.

This happens because aer_acpi_firmware_first() doesn't take 'pcie_ports'
into account. This is wrong. DPC uses the same logic when it decides
whether to load or not, so fixing this also fixes DPC not loading.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
[bhelgaas: return "false" from bool function (from kbuild robot)]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 12833017 21-Jun-2018 Rajat Jain <rajatja@google.com>

PCI/AER: Add sysfs attributes for rootport cumulative stats

Add sysfs attributes for rootport statistics (that are cumulative of all
the ERR_* messages seen on this PCI hierarchy).

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 81aa5206 21-Jun-2018 Rajat Jain <rajatja@google.com>

PCI/AER: Add sysfs attributes to provide AER stats and breakdown

Add sysfs attributes to provide total and breakdown of the AERs seen,
into different type of correctable, fatal and nonfatal errors:

/sys/bus/pci/devices/<dev>/aer_dev_correctable
/sys/bus/pci/devices/<dev>/aer_dev_fatal
/sys/bus/pci/devices/<dev>/aer_dev_nonfatal

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# db89ccbe 30-Jun-2018 Rajat Jain <rajatja@google.com>

PCI/AER: Define aer_stats structure for AER capable devices

Define a structure to hold the AER statistics. There are 2 groups of
statistics: dev_* counters that are to be collected for all AER capable
devices and rootport_* counters that are collected for all (AER capable)
rootports only. Allocate and free this structure when device is added or
released (thus counters survive the lifetime of the device).

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 60ed982a 21-Jun-2018 Rajat Jain <rajatja@google.com>

PCI/AER: Move internal declarations to drivers/pci/pci.h

Since pci_aer_init() and pci_no_aer() are used only internally, move their
declarations to the PCI internal header file. Also, no one cares about
return value of pci_aer_init(), so make it void.

Signed-off-by: Rajat Jain <rajatja@google.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# bd237801 26-Jun-2018 Tyler Baicar <tbaicar@codeaurora.org>

PCI/AER: Adopt lspci names for AER error decoding

lspci uses abbreviated naming for AER error strings. Adopt the same naming
convention for the AER printing so they match.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>


# 1e451160 19-Jul-2018 Keith Busch <kbusch@kernel.org>

PCI/AER: Expose internal API for obtaining AER information

Export some common AER functions and structures for other PCI core drivers
to use. Since this is making the function externally visible inside the
PCI core, prepend "aer_" to the function name.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: move AER declarations from linux/aer.h to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
Reviewed-by: Oza Pawandeep <poza@codeaurora.org>


# 4696b828 08-Jun-2018 Bjorn Helgaas <bhelgaas@google.com>

PCI/AER: Hoist aerdrv.c, aer_inject.c up to drivers/pci/pcie/

Hoist aerdrv.c, aer_inject.c up to drivers/pci/pcie/ so they're next to
other PCIe service drivers. No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>