History log of /linux-master/drivers/firmware/efi/cper.c
Revision Date Author Comments
# 0a5a46a6 06-Feb-2024 Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

PCI/AER: Generalize TLP Header Log reading

Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs
7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after
AER as the struct aer_header_log_regs. Also, not all places that handle TLP
Header Log use the struct and the struct members are named individually.

Generalize the struct name and members, and use it consistently where TLP
Header Log is being handled so that a pcie_read_tlp_log() helper can be
easily added.

Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop ixgbe changes for now, tidy whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# 54ce1927 31-Jan-2024 Ira Weiny <ira.weiny@intel.com>

cxl/cper: Fix errant CPER prints for CXL events

Jonathan reports that CXL CPER events dump an extra generic error
message.

{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: recoverable
{1}[Hardware Error]: Error 0, type: recoverable
{1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
{1}[Hardware Error]: section length: 0x90
{1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
{1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
...

CXL events were rerouted though the CXL subsystem for additional
processing. However, when that work was done it was missed that
cper_estatus_print_section() continued with a generic error message
which is confusing.

Teach CPER print code to ignore printing details of some section types.
Assign the CXL event GUIDs to this set to prevent confusing unknown
prints.

Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# abdbf1a2 28-Oct-2022 Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>

efi/cper, cxl: Decode CXL Protocol Error Section

Add support for decoding CXL Protocol Error Section as defined in UEFI 2.10
Section N.2.13.

Do the section decoding in a new cper_cxl.c file. This new file will be
used in the future for more CXL CPERs decode support. Add this to the
existing UEFI_CPER config.

Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 5012524e 09-Oct-2022 Jia He <justin.he@arm.com>

efi/cper: Export several helpers for ghes_edac to use

Before ghes_edac can be turned back into a proper module again, export
several helpers which are going to be used by it.

Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20221010023559.69655-2-justin.he@arm.com


# 1e0e7f10 08-Mar-2022 Shuai Xue <xueshuai@linux.alibaba.com>

efi/cper: Reformat CPER memory error location to more readable

Remove the space after the colon in cper_mem_err_location() so that it
is easier to parse its output this way, both by humans and tools.

No functional changes.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220308144053.49090-4-xueshuai@linux.alibaba.com


# ed27b5df 08-Mar-2022 Shuai Xue <xueshuai@linux.alibaba.com>

EDAC/ghes: Unify CPER memory error location reporting

Switch the GHES EDAC memory error reporting functions to use the common
CPER ones and get rid of code duplication.

[ bp:
- rewrite commit message, remove useless text
- rip out useless reformatting
- align function params on the opening brace
- rename function to a more descriptive name
- drop useless function exports
- handle buffer lengths properly when printing other detail
- remove useless casting
]

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220308144053.49090-3-xueshuai@linux.alibaba.com


# bdae7965 08-Mar-2022 Shuai Xue <xueshuai@linux.alibaba.com>

efi/cper: Add a cper_mem_err_status_str() to decode error description

Introduce a new helper function cper_mem_err_status_str() to decode the
error status value into a human readable string.

[ bp: Massage. ]

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220308144053.49090-2-xueshuai@linux.alibaba.com


# b3a72ca8 01-Sep-2021 Ard Biesheuvel <ardb@kernel.org>

efi/cper: use stack buffer for error record decoding

Joe reports that using a statically allocated buffer for converting CPER
error records into human readable text is probably a bad idea. Even
though we are not aware of any actual issues, a stack buffer is clearly
a better choice here anyway, so let's move the buffer into the stack
frames of the two functions that refer to it.

Cc: <stable@vger.kernel.org>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 1be72c8e 23-Aug-2021 Shuai Xue <xueshuai@linux.alibaba.com>

efi: cper: check section header more appropriately

When checking a generic status block, we iterate over all the generic data
blocks. The loop condition checks that the generic data block is valid.
Because the size of data blocks (excluding error data) may vary depending
on the revision and the revision is contained within the data block, we
should ensure that enough of the current data block is valid appropriately
for different revision.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 5eff88dd 21-Apr-2021 Rasmus Villemoes <linux@rasmusvillemoes.dk>

efi: cper: fix scnprintf() use in cper_mem_err_location()

The last two if-clauses fail to update n, so whatever they might have
written at &msg[n] would be cut off by the final nul-termination.

That nul-termination is redundant; scnprintf(), just like snprintf(),
guarantees a nul-terminated output buffer, provided the buffer size is
positive.

And there's no need to discount one byte from the initial buffer;
vsnprintf() expects to be given the full buffer size - it's not going
to write the nul-terminator one beyond the given (buffer, size) pair.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 942859d9 21-Apr-2021 Rasmus Villemoes <linux@rasmusvillemoes.dk>

efi: cper: fix snprintf() use in cper_dimm_err_location()

snprintf() should be given the full buffer size, not one less. And it
guarantees nul-termination, so doing it manually afterwards is
pointless.

It's even potentially harmful (though probably not in practice because
CPER_REC_LEN is 256), due to the "return how much would have been
written had the buffer been big enough" semantics. I.e., if the bank
and/or device strings are long enough that the "DIMM location ..."
output gets truncated, writing to msg[n] is a buffer overflow.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Fixes: 3760cd20402d4 ("CPER: Adjust code flow of some functions")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 612b5d50 19-Aug-2020 Alex Kluver <alex.kluver@hpe.com>

cper,edac,efi: Memory Error Record: bank group/address and chip id

Updates to the UEFI 2.8 Memory Error Record allow splitting the bank field
into bank address and bank group, and using the last 3 bits of the extended
field as a chip identifier.

When needed, print correct version of bank field, bank group, and chip
identification.

Based on UEFI 2.8 Table 299. Memory Error Record.

Signed-off-by: Alex Kluver <alex.kluver@hpe.com>
Reviewed-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200819143544.155096-3-alex.kluver@hpe.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 9baf68cc 19-Aug-2020 Alex Kluver <alex.kluver@hpe.com>

edac,ghes,cper: Add Row Extension to Memory Error Record

Memory errors could be printed with incorrect row values since the DIMM
size has outgrown the 16 bit row field in the CPER structure. UEFI
Specification Version 2.8 has increased the size of row by allowing it to
use the first 2 bits from a previously reserved space within the structure.

When needed, add the extension bits to the row value printed.

Based on UEFI 2.8 Table 299. Memory Error Record

Signed-off-by: Alex Kluver <alex.kluver@hpe.com>
Tested-by: Russ Anderson <russ.anderson@hpe.com>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Reviewed-by: Kyle Meyer <kyle.meyer@hpe.com>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200819143544.155096-2-alex.kluver@hpe.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 3d8c11ef 11-May-2020 Punit Agrawal <punit1.agrawal@toshiba.co.jp>

efi: cper: Add support for printing Firmware Error Record Reference

While debugging a boot failure, the following unknown error record was
seen in the boot logs.

<...>
BERT: Error records from previous boot:
[Hardware Error]: event severity: fatal
[Hardware Error]: Error 0, type: fatal
[Hardware Error]: section type: unknown, 81212a96-09ed-4996-9471-8d729c8e69ed
[Hardware Error]: section length: 0x290
[Hardware Error]: 00000000: 00000001 00000000 00000000 00020002 ................
[Hardware Error]: 00000010: 00020002 0000001f 00000320 00000000 ........ .......
[Hardware Error]: 00000020: 00000000 00000000 00000000 00000000 ................
[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................
<...>

On further investigation, it was found that the error record with
UUID (81212a96-09ed-4996-9471-8d729c8e69ed) has been defined in the
UEFI Specification at least since v2.4 and has recently had additional
fields defined in v2.7 Section N.2.10 Firmware Error Record Reference.

Add support for parsing and printing the defined fields to give users
a chance to figure out what went wrong.

Signed-off-by: Punit Agrawal <punit1.agrawal@toshiba.co.jp>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: James Morse <james.morse@arm.com>
Cc: linux-acpi@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Link: https://lore.kernel.org/r/20200512045502.3810339-1-punit1.agrawal@toshiba.co.jp
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# b450b30b 09-Apr-2020 Takashi Iwai <tiwai@suse.de>

efi/cper: Use scnprintf() for avoiding potential buffer overflow

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200311072145.5001-1-tiwai@suse.de
Link: https://lore.kernel.org/r/20200409130434.6736-2-ardb@kernel.org


# 6fb9367a 02-Oct-2019 Lukas Wunner <lukas@wunner.de>

efi/cper: Fix endianness of PCIe class code

The CPER parser assumes that the class code is big endian, but at least
on this edk2-derived Intel Purley platform it's little endian:

efi: EFI v2.50 by EDK II BIOS ID:PLYDCRB1.86B.0119.R05.1701181843
DMI: Intel Corporation PURLEY/PURLEY, BIOS PLYDCRB1.86B.0119.R05.1701181843 01/18/2017

{1}[Hardware Error]: device_id: 0000:5d:00.0
{1}[Hardware Error]: slot: 0
{1}[Hardware Error]: secondary_bus: 0x5e
{1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2030
{1}[Hardware Error]: class_code: 000406
^^^^^^ (should be 060400)

Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Dooks <ben.dooks@codethink.co.uk>
Cc: Dave Young <dyoung@redhat.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Jerry Snitselaar <jsnitsel@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott Talbert <swt@techie.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Cc: linux-integrity@vger.kernel.org
Link: https://lkml.kernel.org/r/20191002165904.8819-2-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# b194a77f 25-Jul-2019 Xiaofei Tan <tanxiaofei@huawei.com>

efi: cper: print AER info of PCIe fatal error

AER info of PCIe fatal error is not printed in the current driver.
Because APEI driver will panic directly for fatal error, and can't
run to the place of printing AER info.

An example log is as following:
{763}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 11
{763}[Hardware Error]: event severity: fatal
{763}[Hardware Error]: Error 0, type: fatal
{763}[Hardware Error]: section_type: PCIe error
{763}[Hardware Error]: port_type: 0, PCIe end point
{763}[Hardware Error]: version: 4.0
{763}[Hardware Error]: command: 0x0000, status: 0x0010
{763}[Hardware Error]: device_id: 0000:82:00.0
{763}[Hardware Error]: slot: 0
{763}[Hardware Error]: secondary_bus: 0x00
{763}[Hardware Error]: vendor_id: 0x8086, device_id: 0x10fb
{763}[Hardware Error]: class_code: 000002
Kernel panic - not syncing: Fatal hardware error!

This issue was imported by the patch, '37448adfc7ce ("aerdrv: Move
cper_print_aer() call out of interrupt context")'. To fix this issue,
this patch adds print of AER info in cper_print_pcie() for fatal error.

Here is the example log after this patch applied:
{24}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 10
{24}[Hardware Error]: event severity: fatal
{24}[Hardware Error]: Error 0, type: fatal
{24}[Hardware Error]: section_type: PCIe error
{24}[Hardware Error]: port_type: 0, PCIe end point
{24}[Hardware Error]: version: 4.0
{24}[Hardware Error]: command: 0x0546, status: 0x4010
{24}[Hardware Error]: device_id: 0000:01:00.0
{24}[Hardware Error]: slot: 0
{24}[Hardware Error]: secondary_bus: 0x00
{24}[Hardware Error]: vendor_id: 0x15b3, device_id: 0x1019
{24}[Hardware Error]: class_code: 000002
{24}[Hardware Error]: aer_uncor_status: 0x00040000, aer_uncor_mask: 0x00000000
{24}[Hardware Error]: aer_uncor_severity: 0x00062010
{24}[Hardware Error]: TLP Header: 000000c0 01010000 00000001 00000000
Kernel panic - not syncing: Fatal hardware error!

Fixes: 37448adfc7ce ("aerdrv: Move cper_print_aer() call out of interrupt context")
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[ardb: put parens around terms of && operator]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


# 45b14a4f 28-Jan-2019 Ross Lagerwall <ross.lagerwall@citrix.com>

efi: cper: Fix possible out-of-bounds access

When checking a generic status block, we iterate over all the generic
data blocks. The loop condition only checks that the start of the
generic data block is valid (within estatus->data_length) but not the
whole block. Because the size of data blocks (excluding error data) may
vary depending on the revision and the revision is contained within the
data block, ensure that enough of the current data block is valid before
dereferencing any members otherwise an out-of-bounds access may occur if
estatus->data_length is invalid.

This relies on the fact that struct acpi_hest_generic_data_v300 is a
superset of the earlier version. Also rework the other checks to avoid
potential underflow.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <baicar.tyler@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 4febfb8d 02-Feb-2019 Ard Biesheuvel <ardb@kernel.org>

efi: Replace GPL license boilerplate with SPDX headers

Replace all GPL license blurbs with an equivalent SPDX header (most
files are GPLv2, some are GPLv2+). While at it, drop some outdated
header changelogs as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Alexander Graf <agraf@suse.de>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20190202094119.13230-7-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# e8f4194d 19-Jul-2018 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

efi/cper: Use consistent types for UUIDs

The commit:

2f74f09bce4f ("efi: parse ARM processor error")

... brought inconsistency in UUID types which are used across the CPER.

Fix this by moving to use guid_t API everywhere.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180720014726.24031-9-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 7bb49709 11-Jul-2018 Arnd Bergmann <arnd@arndb.de>

efi/cper: Avoid using get_seconds()

get_seconds() is deprecated because of the 32-bit time overflow
in y2038/y2106 on 32-bit architectures. The way it is used in
cper_next_record_id() causes an overflow in 2106 when unsigned UTC
seconds overflow, even on 64-bit architectures.

This starts using ktime_get_real_seconds() to give us more than 32 bits
of timestamp on all architectures, and then changes the algorithm to use
39 bits for the timestamp after the y2038 wrap date, plus an always-1
bit at the top. This gives us another 127 epochs of 136 years, with
strictly monotonically increasing sequence numbers across boots.

This is almost certainly overkill, but seems better than just extending
the deadline from 2038 to 2106.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180711094040.12506-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f9e1bdb9 03-May-2018 Yazen Ghannam <yazen.ghannam@amd.com>

efi: Decode IA32/X64 Processor Error Section

Recognize the IA32/X64 Processor Error Section.

Do the section decoding in a new "cper-x86.c" file and add this to the
Makefile depending on a new "UEFI_CPER_X86" config option.

Print the Local APIC ID and CPUID info from the Processor Error Record.

The "Processor Error Info" and "Processor Context" fields will be
decoded in following patches.

Based on UEFI 2.7 Table 252. Processor Error Record.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 75e4fd31 03-May-2018 Borislav Petkov <bp@suse.de>

efi/cper: Remove the INDENT_SP silliness

A separate define just to print a space character is silly and
completely unneeded. Remove it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180504060003.19618-3-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c6d8c8ef 02-Jan-2018 Tyler Baicar <tbaicar@codeaurora.org>

efi: Move ARM CPER code to new file

The ARM CPER code is currently mixed in with the other CPER code. Move it
to a new file to separate it from the rest of the CPER code.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasyl Gomonovych <gomonovych@gmail.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20180102181042.19074-5-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# c0020756 19-Jul-2017 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

efi: switch to use new generic UUID API

There are new types and helpers that are supposed to be used in new code.

As a preparation to get rid of legacy types and API functions do
the conversion here.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# c4335fdd 17-Aug-2017 gengdongjiu <gengdongjiu@huawei.com>

ACPI: APEI: fix the wrong iteration of generic error status block

The revision 0x300 generic error data entry is different
from the old version, but currently iterating through the
GHES estatus blocks does not take into account this difference.
This will lead to failure to get the right data entry if GHES
has revision 0x300 error data entry.

Update the GHES estatus iteration macro to properly increment using
acpi_hest_get_next(), and correct the iteration termination condition
because the status block data length only includes error data
length.

Convert the CPER estatus checking and printing iteration logic
to use same macro.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e9279e83 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

trace, ras: add ARM processor error trace event

Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 0fc300f4 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

efi: print unrecognized CPER section

UEFI spec allows for non-standard section in Common Platform Error
Record. This is defined in section N.2.3 of UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match with
one of the section types that the kernel knows how to parse, the
section is skipped. Therefore, user is not able to see
such CPER data, for instance, error record of non-standard section.

This change prints out the raw data in hex in the dmesg buffer so
that non-standard sections are reported to the user. Non-standard
section type errors should be reported to the user because these
can include errors which are vendor specific. The data length is
taken from Error Data length field of Generic Error Data Entry.

The following is a sample output from dmesg:
Hardware error from APEI Generic Hardware Error Source: 2
It has been corrected by h/w and requires no further action
event severity: corrected
time: precise 2017-03-15 20:37:35
Error 0, type: corrected
section type: unknown, d2e2621c-f936-468d-0d84-15a4ed015c8b
section length: 0x238
00000000: 4d415201 4d492031 453a4d45 435f4343 .RAM1 IMEM:ECC_C
00000010: 53515f45 44525f42 00000000 00000000 E_QSB_RD........
00000020: 00000000 00000000 00000000 00000000 ................
00000030: 00000000 00000000 01010000 01010000 ................
00000040: 00000000 00000000 00000005 00000000 ................
00000050: 01010000 00000000 00000001 00dddd00 ................
...

The raw data from the error can then be decoded using vendor
specific tools.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 2f74f09b 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

efi: parse ARM processor error

Add support for ARM Common Platform Error Record (CPER).
UEFI 2.6 specification adds support for ARM specific
processor error information to be reported as part of the
CPER records. This provides more detail on for processor error logs.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 8a94471f 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

cper: add timestamp print to CPER status printing

The ACPI 6.1 spec added a timestamp to the generic error data
entry structure. Print the timestamp out when printing out the
error information.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# bbcc2e7b 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

ras: acpi/apei: cper: add support for generic data v3 structure

The ACPI 6.1 spec adds a new revision of the generic error data
entry structure. Add support to handle the new structure as well
as properly verify and iterate through the generic data entries.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 4c62360d 30-Jun-2015 Tony Luck <tony.luck@intel.com>

efi: Handle memory error structures produced based on old versions of standard

The memory error record structure includes as its first field a
bitmask of which subsequent fields are valid. The allows new fields
to be added to the structure while keeping compatibility with older
software that parses these records. This mechanism was used between
versions 2.2 and 2.3 to add four new fields, growing the size of the
structure from 73 bytes to 80. But Linux just added all the new
fields so this test:
if (gdata->error_data_length >= sizeof(*mem_err))
cper_print_mem(newpfx, mem_err);
else
goto err_section_too_small;
now make Linux complain about old format records being too short.

Add a definition for the old format of the structure and use that
for the minimum size check. Pass the actual size to cper_print_mem()
so it can sanity check the validation_bits field to ensure that if
a BIOS using the old format sets bits as if it were new, we won't
access fields beyond the end of the structure.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>


# dbcf3e06 31-Oct-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org>

RAS/tracing: Use trace_seq_buffer_ptr() helper instead of open coded

Use the helper function trace_seq_buffer_ptr() to get the current location
of the next buffer write of a trace_seq object, instead of open coding
it.

This facilitates the conversion of trace_seq to use seq_buf.

Tested-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Petr Mladek <pmladek@suse.cz>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>


# 2dfb7d51 17-Jun-2014 Chen, Gong <gong.chen@linux.intel.com>

trace, RAS: Add eMCA trace event interface

Add trace interface to elaborate all H/W error related information.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 3760cd20 11-Jun-2014 Chen, Gong <gong.chen@linux.intel.com>

CPER: Adjust code flow of some functions

Some codes can be reorganzied as a common function for other usages.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 0a00fd5e 03-Jun-2014 Lv Zheng <lv.zheng@intel.com>

ACPICA: Restore error table definitions to reduce code differences between Linux and ACPICA upstream.

The following commit has changed ACPICA table header definitions:

Commit: 88f074f4871a8c212b212b725e4dcdcdb09613c1
Subject: ACPI, CPER: Update cper info

While such definitions are currently maintained in ACPICA. As the
modifications applying to the table definitions affect other OSPMs'
drivers, it is very difficult for ACPICA to initiate a process to
complete the merge. Thus this commit finally only leaves us divergences.

Revert such naming modifications to reduce the source code differecnes
between Linux and ACPICA upstream. No functional changes.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7ea6c6c1 28-Oct-2013 Tony Luck <tony.luck@intel.com>

Move cper.c from drivers/acpi/apei to drivers/firmware/efi

cper.c contains code to decode and print "Common Platform Error Records".
Originally added under drivers/acpi/apei because the only user was in that
same directory - but now we have another consumer, and we shouldn't have
to force CONFIG_ACPI_APEI get access to this code.

Since CPER is defined in the UEFI specification - the logical home for
this code is under drivers/firmware/efi/

Acked-by: Matt Fleming <matt.fleming@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>