History log of /linux-master/drivers/acpi/apei/bert.c
Revision Date Author Comments
# d38f6bce 06-Jun-2023 Miaohe Lin <linmiaohe@huawei.com>

ACPI: APEI: mark bert_disable as __initdata

It's only used inside the __init section. Mark it __initdata.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fd936fd8 23-May-2023 Arnd Bergmann <arnd@arndb.de>

efi: fix missing prototype warnings

The cper.c file needs to include an extra header, and efi_zboot_entry
needs an extern declaration to avoid these 'make W=1' warnings:

drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes]
drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes]

To make this easier, move the cper specific declarations to
include/linux/cper.h.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 37096428 24-Aug-2022 Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>

ACPI: APEI: Add BERT error log footer

Print total number of records found during BERT log parsing.
This also simplify dmesg parser implementation for BERT events.

Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# c3481b6b 22-Jun-2022 Tony Luck <tony.luck@intel.com>

ACPI: APEI: Better fix to avoid spamming the console with old error logs

The fix in commit 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT
table data") does not work as intended on systems where the BIOS has a
fixed size block of memory for the BERT table, relying on s/w to quit
when it finds a record with estatus->block_status == 0. On these systems
all errors are suppressed because the check:

if (region_len < ACPI_BERT_PRINT_MAX_LEN)

always fails.

New scheme skips individual CPER records that are too large, and also
limits the total number of records that will be printed to 5.

Fixes: 3f8dec116210 ("ACPI/APEI: Limit printable size of BERT table data")
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3f8dec11 08-Mar-2022 Darren Hart <darren@os.amperecomputing.com>

ACPI/APEI: Limit printable size of BERT table data

Platforms with large BERT table data can trigger soft lockup errors
while attempting to print the entire BERT table data to the console at
boot:

watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1]

Observed on Ampere Altra systems with a single BERT record of ~250KB.

The original bert driver appears to have assumed relatively small table
data. Since it is impractical to reassemble large table data from
interwoven console messages, and the table data is available in

/sys/firmware/acpi/tables/data/BERT

limit the size for tables printed to the console to 1024 (for no reason
other than it seemed like a good place to kick off the discussion, would
appreciate feedback from existing users in terms of what size would
maintain their current usage model).

Alternatively, we could make printing a CONFIG option, use the
bert_disable boot arg (or something similar), or use a debug log level.
However, all those solutions require extra steps or change the existing
behavior for small table data. Limiting the size preserves existing
behavior on existing platforms with small table data, and eliminates the
soft lockups for platforms with large table data, while still making it
available.

Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f3303ff6 05-Mar-2022 Randy Dunlap <rdunlap@infradead.org>

ACPI: APEI: fix return value of __setup handlers

__setup() handlers should return 1 to indicate that the boot option
has been handled. Returning 0 causes a boot option to be listed in
the Unknown kernel command line parameters and also added to init's
arg list (if no '=' sign) or environment list (if of the form 'a=b').

Unknown kernel command line parameters "erst_disable
bert_disable hest_disable BOOT_IMAGE=/boot/bzImage-517rc6", will be
passed to user space.

Run /sbin/init as init process
with arguments:
/sbin/init
erst_disable
bert_disable
hest_disable
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6

Fixes: a3e2acc5e37b ("ACPI / APEI: Add Boot Error Record Table (BERT) support")
Fixes: a08f82d08053 ("ACPI, APEI, Error Record Serialization Table (ERST) support")
Fixes: 9dc966641677 ("ACPI, APEI, HEST table parsing")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3d995f88 07-May-2020 Hanjun Guo <guohanjun@huawei.com>

ACPI: APEI: Put the boot error record table after parsing

The mapped boot error record table is not used after
bert_init(), release it.

Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 82664963 01-Jun-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 437

Based on 1 normalized pattern(s):

this file is licensed under gplv2

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 22 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190115.129548190@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# 1c0d9b1c 28-Jan-2019 Ross Lagerwall <ross.lagerwall@citrix.com>

ACPI: APEI: Fix possible out-of-bounds access to BERT region

Check that the length recorded in the generic error status block is
within the region before checking the contents of the region itself.

Otherwise it may result in an out-of-bounds access if the system
firmware has generated a status block with an invalid length (larger
than the mapped region). Also move the block_status check so that it
only happens after the block has been verified to be within the mapped
region.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <baicar.tyler@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 41139ac3 19-Feb-2017 Huang Ying <ying.huang@intel.com>

ACPI: APEI: Fix BERT resources conflict with ACPI NVS area

It was reported that on some machines, there is overlap between ACPI
NVS area and BERT address range. This appears reasonable because BERT
contents need to be non-volatile across reboot. But this will cause
resources conflict in current Linux kernel implementation because the
ACPI NVS area is marked as busy. The resource conflict is fixed via
excluding the ACPI NVS area when requesting IO resources for BERT.
When accessing the BERT contents, the whole BERT address range will be
ioremapped and accessed.

Reported-and-tested-by: Hans Kristian Rosbach <hansr@raskesider.no>
Signed-off-by: Ying Huang <ying.huang@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# a3e2acc5 29-Jun-2016 Huang Ying <ying.huang@intel.com>

ACPI / APEI: Add Boot Error Record Table (BERT) support

ACPI/APEI is designed to verifiy/report H/W errors, like Corrected
Error(CE) and Uncorrected Error(UC). It contains four tables: HEST,
ERST, EINJ and BERT. The first three tables have been merged for
a long time, but because of lacking BIOS support for BERT, the
support for BERT is pending until now. Recently on ARM 64 platform
it is has been supported. So here we come.

Under normal circumstances, when a hardware error occurs, kernel will
be notified via NMI, MCE or some other method, then kernel will
process the error condition, report it, and recover it if possible.
But sometime, the situation is so bad, so that firmware may choose to
reset directly without notifying Linux kernel.

Linux kernel can use the Boot Error Record Table (BERT) to get the
un-notified hardware errors that occurred in a previous boot. In this
patch, the error information is reported via printk.

For more information about BERT, please refer to ACPI Specification
version 6.0, section 18.3.1:
http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf

The following log is a BERT record after system reboot because of hitting
a fatal memory error:
BERT: Error records from previous boot:
[Hardware Error]: It has been corrected by h/w and requires no further action
[Hardware Error]: event severity: corrected
[Hardware Error]: Error 0, type: recoverable
[Hardware Error]: section_type: memory error
[Hardware Error]: error_status: 0x0000000000000400
[Hardware Error]: physical_address: 0xffffffffffffffff
[Hardware Error]: card: 1 module: 2 bank: 3 row: 1 column: 2 bit_position: 5
[Hardware Error]: error_type: 2, single-bit ECC

[Tomasz Nowicki: Clear error status at the end of error handling]
[Tony: Applied some cleanups suggested by Fu Wei]
[Fu Wei: delete EXPORT_SYMBOL_GPL(bert_disable), improve the code]

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Tested-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Fu Wei <fu.wei@linaro.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>