History log of /linux-master/drivers/acpi/apei/ghes.c
Revision Date Author Comments
# f2f212f3 20-Nov-2023 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ACPI: APEI: GHES: Convert to platform remove callback returning void

The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.

To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().

Instead of returning an error code, emit a better error message than the
core. Apart from the improved error message this patch has no effects
for the driver.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f3e6b3ae 18-Feb-2024 Dan Williams <dan.j.williams@intel.com>

acpi/ghes: Remove CXL CPER notifications

Initial tests with the CXL CPER implementation identified that error
reports were being duplicated in the log and the trace event [1]. Then
it was discovered that the notification handler took sleeping locks
while the GHES event handling runs in spin_lock_irqsave() context [2]

While the duplicate reporting was fixed in v6.8-rc4, the fix for the
sleeping-lock-vs-atomic collision would enjoy more time to settle and
gain some test cycles. Given how late it is in the development cycle,
remove the CXL hookup for now and try again during the next merge
window.

Note that end result is that v6.8 does not emit CXL CPER payloads to the
kernel log, but this is in line with the CXL trend to move error
reporting to trace events instead of the kernel log.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1]
Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# 54ce1927 31-Jan-2024 Ira Weiny <ira.weiny@intel.com>

cxl/cper: Fix errant CPER prints for CXL events

Jonathan reports that CXL CPER events dump an extra generic error
message.

{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: recoverable
{1}[Hardware Error]: Error 0, type: recoverable
{1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6
{1}[Hardware Error]: section length: 0x90
{1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................
{1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................
...

CXL events were rerouted though the CXL subsystem for additional
processing. However, when that work was done it was missed that
cper_estatus_print_section() continued with a generic error message
which is confusing.

Teach CPER print code to ignore printing details of some section types.
Assign the CXL event GUIDs to this set to prevent confusing unknown
prints.

Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>


# 671a794c 20-Dec-2023 Ira Weiny <ira.weiny@intel.com>

acpi/ghes: Process CXL Component Events

BIOS can configure memory devices as firmware first. This will send CXL
events to the firmware instead of the OS. The firmware can then send
these events to the OS via UEFI.

UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER)
format for CXL Component Events. The format is mostly the same as the
CXL Common Event Record Format. The difference is the use of a GUID in
the Section Type rather than a UUID as part of the event itself.

Add GHES support to detect CXL CPER records and call a registered
callback with the event.

A notifier chain was considered for the callback but the complexity did
not justify the use case as only the CXL subsystem requires this event.
Enforce that only one callback can be registered at any time.

Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-7-1bb8a4ca2c7a@intel.com
[djbw: fixup checkpatch errors]
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>


# a70297d2 17-Dec-2023 Shuai Xue <xueshuai@linux.alibaba.com>

ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events

There are two major types of uncorrected recoverable (UCR) errors :

- Synchronous error: The error is detected and raised at the point of
the consumption in the execution flow, e.g. when a CPU tries to
access a poisoned cache line. The CPU will take a synchronous error
exception such as Synchronous External Abort (SEA) on Arm64 and
Machine Check Exception (MCE) on X86. OS requires to take action (for
example, offline failure page/kill failure thread) to recover this
uncorrectable error.

- Asynchronous error: The error is detected out of processor execution
context, e.g. when an error is detected by a background scrubber.
Some data in the memory are corrupted. But the data have not been
consumed. OS is optional to take action to recover this uncorrectable
error.

When APEI firmware first is enabled, a platform may describe one error
source for the handling of synchronous errors (e.g. MCE or SEA notification
), or for handling asynchronous errors (e.g. SCI or External Interrupt
notification). In other words, we can distinguish synchronous errors by
APEI notification. For synchronous errors, kernel will kill the current
process which accessing the poisoned page by sending SIGBUS with
BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the
process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in
early kill mode. However, the GHES driver always sets mf_flags to 0 so that
all synchronous errors are handled as asynchronous errors in memory failure.

To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous
events.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Tested-by: Ma Wupeng <mawupeng1@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e2abc47a 20-Sep-2023 Shiju Jose <shiju.jose@huawei.com>

ACPI: APEI: Fix AER info corruption when error status data has multiple sections

ghes_handle_aer() passes AER data to the PCI core for logging and
recovery by calling aer_recover_queue() with a pointer to struct
aer_capability_regs.

The problem was that aer_recover_queue() queues the pointer directly
without copying the aer_capability_regs data. The pointer was to
the ghes->estatus buffer, which could be reused before
aer_recover_work_func() reads the data.

To avoid this problem, allocate a new aer_capability_regs structure
from the ghes_estatus_pool, copy the AER data from the ghes->estatus
buffer into it, pass a pointer to the new struct to
aer_recover_queue(), and free it after aer_recover_work_func() has
processed it.

Reported-by: Bjorn Helgaas <helgaas@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9368aa18 19-May-2023 Li Yang <leoyang.li@nxp.com>

APEI: GHES: correctly return NULL for ghes_get_devices()

Since 315bada690e0 ("EDAC: Check for GHES preference in the
chipset-specific EDAC drivers"), vendor specific EDAC driver will not
probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device
is present. Make ghes_get_devices() return NULL when the GHES device
list is empty to fix the problem.

Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module")
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 43482701 27-May-2023 Miaohe Lin <linmiaohe@huawei.com>

ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request()

ghes_estatus_pool_size_request() is unused now, so remove it.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 292a089d 20-Dec-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

treewide: Convert del_timer*() to timer_shutdown*()

Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown". After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed. It also ignores any locations where
the timer->function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

$ cat timer.cocci
@@
expression ptr, slab;
identifier timer, rfield;
@@
(
- del_timer(&ptr->timer);
+ timer_shutdown(&ptr->timer);
|
- del_timer_sync(&ptr->timer);
+ timer_shutdown_sync(&ptr->timer);
)
... when strict
when != ptr->timer
(
kfree_rcu(ptr, rfield);
|
kmem_cache_free(slab, ptr);
|
kfree(ptr);
)

$ spatch timer.cocci . > /tmp/t.patch
$ patch -p1 < /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ]
Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ]
Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# dd3fa54b 24-Oct-2022 Ard Biesheuvel <ardb@kernel.org>

apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()

Some documentation first, about how this machinery works:

It seems, the intent of the GHES error records cache is to collect
already reported errors - see the ghes_estatus_cached() checks. There's
even a sentence trying to say what this does:

/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
*/

New elements are added to the cache this way:

if (!ghes_estatus_cached(estatus)) {
if (ghes_print_estatus(NULL, ghes->generic, estatus))
ghes_estatus_cache_add(ghes->generic, estatus);

The intent being, once this new error record is reported, it gets cached
so that it doesn't get reported for a while due to too many, same-type
error records getting reported in burst-like scenarios. I.e., new,
unreported error types can have a higher chance of getting reported.

Now, the loop in ghes_estatus_cache_add() is trying to pick out the
oldest element in there. Meaning, something which got reported already
but a long while ago, i.e., a LRU-type scheme.

And the cmpxchg() is there presumably to make sure when that selected
element slot_cache is removed, it really *is* that element that gets
removed and not one which replaced it in the meantime.

Now, ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a newly cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is being done here, the race can only
cause a failure if the selected slot was updated with another new item
concurrently, which means that it is arbitrary which of those two items
gets dropped.

And "dropped" here means, the item doesn't get added to the cache so
the next time it is seen, it'll get reported again and an insertion
attempt will be done again. Eventually, it'll get inserted and all those
times when the insertion fails, the item will get reported although the
cache is supposed to prevent that and "ratelimit" those repeated error
records. Not a big deal in any case.

This means the cmpxchg() and the special case are not necessary.
Therefore, just drop the existing item unconditionally.

Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock
section since there is no actually dereferencing the pointer at all.

[ bp:
- Flesh out and summarize what was discussed on the thread now
that that cache contraption is understood;
- Touch up code style. ]

Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 36006ccb 13-Oct-2022 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>

ACPI: APEI: Drop unsetting driver data on remove

Since commit 0998d0631001 ("device-core: Ensure drvdata = NULL when no
driver is bound") the driver core cares for cleaning driver data, so
don't do it in the driver, too.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0d2aa70b 18-Oct-2022 Ard Biesheuvel <ardb@kernel.org>

apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()

Some documentation first, about how this machinery works:

It seems, the intent of the GHES error records cache is to collect
already reported errors - see the ghes_estatus_cached() checks. There's
even a sentence trying to say what this does:

/*
* GHES error status reporting throttle, to report more kinds of
* errors, instead of just most frequently occurred errors.
*/

New elements are added to the cache this way:

if (!ghes_estatus_cached(estatus)) {
if (ghes_print_estatus(NULL, ghes->generic, estatus))
ghes_estatus_cache_add(ghes->generic, estatus);

The intent being, once this new error record is reported, it gets cached
so that it doesn't get reported for a while due to too many, same-type
error records getting reported in burst-like scenarios. I.e., new,
unreported error types can have a higher chance of getting reported.

Now, the loop in ghes_estatus_cache_add() is trying to pick out the
oldest element in there. Meaning, something which got reported already
but a long while ago, i.e., a LRU-type scheme.

And the cmpxchg() is there presumably to make sure when that selected
element slot_cache is removed, it really *is* that element that gets
removed and not one which replaced it in the meantime.

Now, ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a newly cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is being done here, the race can only
cause a failure if the selected slot was updated with another new item
concurrently, which means that it is arbitrary which of those two items
gets dropped.

And "dropped" here means, the item doesn't get added to the cache so
the next time it is seen, it'll get reported again and an insertion
attempt will be done again. Eventually, it'll get inserted and all those
times when the insertion fails, the item will get reported although the
cache is supposed to prevent that and "ratelimit" those repeated error
records. Not a big deal in any case.

This means the cmpxchg() and the special case are not necessary.
Therefore, just drop the existing item unconditionally.

Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock
section since there is no actually dereferencing the pointer at all.

[ bp:
- Flesh out and summarize what was discussed on the thread now
that that cache contraption is understood;
- Touch up code style. ]

Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com


# 802e7f1d 09-Oct-2022 Jia He <justin.he@arm.com>

EDAC/ghes: Make ghes_edac a proper module

Commit

dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")

introduced a bug leading to ghes_edac_register() to be invoked before
edac_init(). Because at that time the bus "edac" hadn't been even
registered, this created sysfs nodes as /devices/mc0 instead of
/sys/devices/system/edac/mc/mc0 on an Ampere eMag server.

Fix this by turning ghes_edac into a proper module.

The list of GHES devices returned is not protected from being modified
concurrently but it is pretty static as it gets created only during GHES
init and latter is not a module so...

[ bp: Massage. ]

Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Co-developed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com


# 9057a3f7 09-Oct-2022 Jia He <justin.he@arm.com>

EDAC/ghes: Prepare to make ghes_edac a proper module

To make ghes_edac a proper module, prepare to decouple its dependencies
from GHES.

Move the ghes_edac.force_load parameter to ghes.c in order to
properly control whether ghes_edac should be force-loaded: In
ghes_edac_register() it is too late to set the module flag.

Introduce a helper ghes_get_devices(), which returns the list of GHES
devices which got probed when the platform-check passes on the system.

The previous force_load check is not needed in ghes_edac_unregister()
since it will be checked in the module's init function of ghes_edac
later.

[ bp: Massage. ]

Suggested-by: Toshi Kani <toshi.kani@hpe.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com


# 8e40612f 09-Oct-2022 Jia He <justin.he@arm.com>

EDAC/ghes: Add a notifier for reporting memory errors

In order to make it a proper module and disentangle it from facilities,
add a notifier for reporting memory errors. Use an atomic notifier
because calls sites like ghes_proc_in_irq() run in interrupt context.

[ bp: Massage commit message. ]

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com


# 43d27483 05-Oct-2022 Ashish Kalra <ashish.kalra@amd.com>

ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init()

Change num_ghes from int to unsigned int, preventing an overflow
and causing subsequent vmalloc() to fail.

The overflow happens in ghes_estatus_pool_init() when calculating
len during execution of the statement below as both multiplication
operands here are signed int:

len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE);

The following call trace is observed because of this bug:

[ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1
[ 9.317131] Call Trace:
[ 9.317134] <TASK>
[ 9.317137] dump_stack_lvl+0x49/0x5f
[ 9.317145] dump_stack+0x10/0x12
[ 9.317146] warn_alloc.cold+0x7b/0xdf
[ 9.317150] ? __device_attach+0x16a/0x1b0
[ 9.317155] __vmalloc_node_range+0x702/0x740
[ 9.317160] ? device_add+0x17f/0x920
[ 9.317164] ? dev_set_name+0x53/0x70
[ 9.317166] ? platform_device_add+0xf9/0x240
[ 9.317168] __vmalloc_node+0x49/0x50
[ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0
[ 9.317176] vmalloc+0x21/0x30
[ 9.317177] ghes_estatus_pool_init+0x43/0xa0
[ 9.317179] acpi_hest_init+0x129/0x19c
[ 9.317185] acpi_init+0x434/0x4a4
[ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a
[ 9.317190] do_one_initcall+0x48/0x200
[ 9.317195] kernel_init_freeable+0x221/0x284
[ 9.317200] ? rest_init+0xe0/0xe0
[ 9.317204] kernel_init+0x1a/0x130
[ 9.317205] ret_from_fork+0x22/0x30
[ 9.317208] </TASK>

Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 415fed69 24-Sep-2022 Shuai Xue <xueshuai@linux.alibaba.com>

ACPI: APEI: do not add task_work to kernel thread to avoid memory leak

If an error is detected as a result of user-space process accessing a
corrupt memory location, the CPU may take an abort. Then the platform
firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
NOTIFY_SOFTWARE_DELEGATED, etc.

For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the
memory_failure() queue for synchronous errors") keep track of whether
memory_failure() work was queued, and make task_work pending to flush out
the queue so that the work is processed before return to user-space.

The code use init_mm to check whether the error occurs in user space:

if (current->mm != &init_mm)

The condition is always true, becase _nobody_ ever has "init_mm" as a real
VM any more.

In addition to abort, errors can also be signaled as asynchronous
exceptions, such as interrupt and SError. In such case, the interrupted
current process could be any kind of thread. When a kernel thread is
interrupted, the work ghes_kick_task_work deferred to task_work will never
be processed because entry_handler returns to call ret_to_kernel() instead
of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
After around 200 allocations in our platform, the ghes_estatus_pool will
run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
result, the event failed to be processed.

sdei: event 805 on CPU 113 failed with error: -2

Finally, a lot of unhandled events may cause platform firmware to exceed
some threshold and reboot.

The condition should generally just do

if (current->mm)

as described in active_mm.rst documentation.

Then if an asynchronous error is detected when a kernel thread is running,
(e.g. when detected by a background scrubber), do not add task_work to it
as the original patch intends to do.

Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 27e932a3 27-Feb-2022 Shuai Xue <xueshuai@linux.alibaba.com>

ACPI: APEI: rename ghes_init() with an "acpi_" prefix

ghes_init() sticks out in acpi_init() because it is the only functions
without an "acpi_" prefix.

Rename ghes_init with an "acpi_" prefix, then all looks fine.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# dc4e8c07 27-Feb-2022 Shuai Xue <xueshuai@linux.alibaba.com>

ACPI: APEI: explicit init of HEST and GHES in apci_init()

From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus
memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage
the estatus memory pool. On the other hand, ghes_init() relies on
sdei_init() to detect the SDEI version and (un)register events. The
dependencies are as follows:

ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init()
ghes_init() => sdei_init()

HEST is not PCI-specific and initcall ordering is implicit and not
well-defined within a level.

Based on above, remove acpi_hest_init() from acpi_pci_root_init() and
convert ghes_init() and sdei_init() from initcalls to explicit calls in the
following order:

acpi_hest_init()
ghes_init()
sdei_init()

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3ad6fd77 26-Oct-2021 Tony Luck <tony.luck@intel.com>

x86/sgx: Add check for SGX pages to ghes_do_memory_failure()

SGX EPC pages do not have a "struct page" associated with them so the
pfn_valid() sanity check fails and results in a warning message to
the console.

Add an additional check to skip the warning if the address of the error
is in an SGX EPC page.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com


# ccb5ecdc 11-Jun-2021 Xiaofei Tan <tanxiaofei@huawei.com>

ACPI: APEI: fix synchronous external aborts in user-mode

Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea()
synchronise with APEI's irq work"), do_sea() would unconditionally
signal the affected task from the arch code. Since that change,
the GHES driver sends the signals.

This exposes a problem as errors the GHES driver doesn't understand
or doesn't handle effectively are silently ignored. It will cause
the errors get taken again, and circulate endlessly. User-space task
get stuck in this loop.

Existing firmware on Kunpeng9xx systems reports cache errors with the
'ARM Processor Error' CPER records.

Do memory failure handling for ARM Processor Error Section just like
for Memory Error Section.

Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work")
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Reviewed-by: James Morse <james.morse@arm.com>
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 91989c70 16-Oct-2020 Jens Axboe <axboe@kernel.dk>

task_work: cleanup notification modes

A previous commit changed the notification mode from true/false to an
int, allowing notify-no, notify-yes, or signal-notify. This was
backwards compatible in the sense that any existing true/false user
would translate to either 0 (on notification sent) or 1, the latter
which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.

Clean this up properly, and define a proper enum for the notification
mode. Now we have:

- TWA_NONE. This is 0, same as before the original change, meaning no
notification requested.
- TWA_RESUME. This is 1, same as before the original change, meaning
that we use TIF_NOTIFY_RESUME.
- TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
notification.

Clean up all the callers, switching their 0/1/false/true to using the
appropriate TWA_* mode for notifications.

Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>


# 9aa9cf3e 03-Sep-2020 Shiju Jose <shiju.jose@huawei.com>

ACPI / APEI: Add a notifier chain for unknown (vendor) CPER records

CPER records describing a firmware-first error are identified by GUID.
The ghes driver currently logs, but ignores any unknown CPER records.
This prevents describing errors that can't be represented by a standard
entry, that would otherwise allow a driver to recover from an error.
The UEFI spec calls these 'Non-standard Section Body' (N.2.3 of
version 2.8).

Add a notifier chain for these non-standard/vendor-records. Callers
must identify their type of records by GUID.

Record data is copied to memory from the ghes_estatus_pool to allow
us to keep it until after the notifier has run.

Co-developed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20200903123456.1823-2-shiju.jose@huawei.com
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>


# 73f693c3 01-Jun-2020 Joerg Roedel <jroedel@suse.de>

mm: remove vmalloc_sync_(un)mappings()

These functions are not needed anymore because the vmalloc and ioremap
mappings are now synchronized when they are created or torn down.

Remove all callers and function definitions.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 7f17b4a1 01-May-2020 James Morse <james.morse@arm.com>

ACPI: APEI: Kick the memory_failure() queue for synchronous errors

memory_failure() offlines or repairs pages of memory that have been
discovered to be corrupt. These may be detected by an external
component, (e.g. the memory controller), and notified via an IRQ.
In this case the work is queued as not all of memory_failure()s work
can happen in IRQ context.

If the error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64
this is a 'synchronous external abort', and on a firmware first
system it is replayed using NOTIFY_SEA.

This notification has NMI like properties, (it can interrupt
IRQ-masked code), so the memory_failure() work is queued. If we
return to user-space before the queued memory_failure() work is
processed, we will take the fault again. This loop may cause platform
firmware to exceed some threshold and reboot when Linux could have
recovered from this error.

For NMIlike notifications keep track of whether memory_failure() work
was queued, and make task_work pending to flush out the queue.
To save memory allocations, the task_work is allocated as part of
the ghes_estatus_node, and free()ing it back to the pool is deferred.

Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 763802b5 21-Mar-2020 Joerg Roedel <jroedel@suse.de>

x86/mm: split vmalloc_sync_all()

Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path. While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

* vmalloc_sync_mappings(), and
* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized. The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# cea79e7e 08-Jan-2020 Bhaskar Upadhaya <bupadhaya@marvell.com>

apei/ghes: Do not delay GHES polling

Currently, the ghes_poll_func() timer callback is registered with the
TIMER_DEFERRABLE flag. Thus, it is run when the CPU eventually wakes
up together with a subsequent non-deferrable timer and not at the precisely
configured polling interval.

For polling mode, the polling interval configured by firmware should not
be exceeded according to the ACPI spec 6.3, Table 18-394. The definition
of the polling interval is:

"Indicates the poll interval in milliseconds OSPM should use to
periodically check the error source for the presence of an error
condition."

If this interval is extended due to the timer callback deferring, error
records can get lost. Which we are observing on our ThunderX2 platforms.

Therefore, remove the TIMER_DEFERRABLE flag so that the timer callback
executes at the precise interval.

Signed-off-by: Bhaskar Upadhaya <bupadhaya@marvell.com>
[ bp: Subject & changelog ]
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 933ca4e3 17-Oct-2019 Kefeng Wang <wangkefeng.wang@huawei.com>

acpi: Use pr_warn instead of pr_warning

As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of
pr_warning"), removing pr_warning so all logging messages use a
consistent <prefix>_warn style. Let's do it.

Link: http://lkml.kernel.org/r/20191018031850.48498-8-wangkefeng.wang@huawei.com
To: linux-kernel@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek@suse.com: two more indentation fixes]
Signed-off-by: Petr Mladek <pmladek@suse.com>


# 6abc7622 15-Jul-2019 Liguang Zhang <zhangliguang@linux.alibaba.com>

ACPI / APEI: Release resources if gen_pool_add() fails

Destroy ghes_estatus_pool and release memory allocated via vmalloc() on
errors in ghes_estatus_pool_init() in order to avoid memory leaks.

[ bp: do the labels properly and with descriptive names and massage. ]

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1563173924-47479-1-git-send-email-zhangliguang@linux.alibaba.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bb100b64 17-Jul-2019 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

ACPI / APEI: Get rid of NULL_UUID_LE constant

This is a missed part of the commit 5b53696a30d5
("ACPI / APEI: Switch to use new generic UUID API"), i.e.
replacing old definition with a global constant variable.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 371b8689 24-Jun-2019 Liguang Zhang <zhangliguang@linux.alibaba.com>

ACPI / APEI: Remove needless __ghes_check_estatus() calls

Function __ghes_check_estatus() is always called after
__ghes_peek_estatus(), but it is already called in __ghes_peek_estatus().
So we should remove some needless __ghes_check_estatus() calls.

Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 1802d0be 27-May-2019 Thomas Gleixner <tglx@linutronix.de>

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174

Based on 1 normalized pattern(s):

this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details

extracted by the scancode license scanner the SPDX license identifier

GPL-2.0-only

has been chosen to replace the boilerplate/reference in 655 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# f9f05395 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Add support for the SDEI GHES Notification type

If the GHES notification type is SDEI, register the provided event
using the SDEI-GHES helper.

SDEI may be one of two types of event, normal and critical. Critical
events can interrupt normal events, so these must have separate
fixmap slots and locks in case both event types are in use.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b972d2ea 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications

Now that ghes notification helpers provide the fixmap slots and
take the lock themselves, multiple NMI-like notifications can
be used on arm64.

These should be named after their notification method as they can't
all be called 'NMI'. x86's NOTIFY_NMI already is, change the SEA
fixmap entry to be called FIX_APEI_GHES_SEA.

Future patches can add support for FIX_APEI_GHES_SEI and
FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}.

Because all of ghes.c builds on both architectures, provide a
constant for each fixmap entry that the architecture will never
use.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# d9f608dc 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry()

Each struct ghes has an worst-case sized buffer for storing the
estatus. If an error is being processed by ghes_proc() in process
context this buffer will be in use. If the error source then triggers
an NMI-like notification, the same buffer will be used by
in_nmi_queue_one_entry() to stage the estatus data, before
__process_error() copys it into a queued estatus entry.

Merge __process_error()s work into in_nmi_queue_one_entry() so that
the queued estatus entry is used from the beginning. Use the new
ghes_peek_estatus() to know how much memory to allocate from
the ghes_estatus_pool before reading the records.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>

Change since v6:
* Added a comment explaining the 'ack-error, then goto no_work'.
* Added missing esatus-clearing, which is necessary after reading the GAS,
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e00a6e33 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length

ghes_read_estatus() reads the record address, then the record's
header, then performs some sanity checks before reading the
records into the provided estatus buffer.

To provide this estatus buffer the caller must know the size of the
records in advance, or always provide a worst-case sized buffer as
happens today for the non-NMI notifications.

Add a function to peek at the record's header to find the size. This
will let the NMI path allocate the right amount of memory before reading
the records, instead of using the worst-case size, and having to copy
the records.

Split ghes_read_estatus() to create __ghes_peek_estatus() which
returns the address and size of the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>

Changes since v7:
* Grammar
* concistent argument ordering

Changes since v6:
* Additional buf_addr = 0 error handling
* Moved checking out of peek-estatus
* Reworded an error message so we can tell them apart
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f2a681b9 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Make GHES estatus header validation more user friendly

ghes_read_estatus() checks various lengths in the top-level header to
ensure the CPER records to be read aren't obviously corrupt.

Take the opportunity to make this more user-friendly, printing a
(ratelimited) message about the nature of the header format error.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: James Morse <james.morse@arm.com>
[ rjw: Add missing 'static' ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# f2a7e059 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Pass ghes and estatus separately to avoid a later copy

The NMI-like notifications scribble over ghes->estatus, before
copying it somewhere else. If this interrupts the ghes_probe() code
calling ghes_proc() on each struct ghes, the data is corrupted.

All the NMI-like notifications should use a queued estatus entry
from the beginning, instead of the ghes version, then copying it.
To do this, break up any use of "ghes->estatus" so that all
functions take the estatus as an argument.

This patch just moves these ghes->estatus dereferences into separate
arguments, no change in behaviour. struct ghes becomes unused in
ghes_clear_estatus() as it only wanted ghes->estatus, which we now
pass directly. This is removed.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# b484079b 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Let the notification helper specify the fixmap slot

ghes_copy_tofrom_phys() uses a different fixmap slot depending on in_nmi().
This doesn't work when there are multiple NMI-like notifications, that
could interrupt each other.

As with the locking, move the chosen fixmap_idx to the notification helper.
This only matters for NMI-like notifications, anything calling
ghes_proc() can use the IRQ fixmap slot as its already holding an irqsave
spinlock.

This lets us collapse the ghes_ioremap_pfn_*() helpers.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3b880cbe 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Move locking to the notification helper

ghes_copy_tofrom_phys() takes different locks depending on in_nmi().
This doesn't work if there are multiple NMI-like notifications, that
can interrupt each other.

Now that NOTIFY_SEA is always called in the same context, move the
lock-taking to the notification helper. The helper will always know
which lock to take. This avoids ghes_copy_tofrom_phys() taking a guess
based on in_nmi().

This splits NOTIFY_NMI and NOTIFY_SEA to use different locks. All
the other notifications use ghes_proc(), and are called in process
or IRQ context. Move the spin_lock_irqsave() around their ghes_proc()
calls.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 255097c8 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue

Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Remove Kconfig's ability to turn ACPI_APEI_SEA off if ACPI_APEI_GHES
is selected. This roughly matches the x86 NOTIFY_NMI behaviour, and means
each architecture has at least one user of the estatus-queue, meaning it
doesn't need guarding with ifdef.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9c9d0805 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI

The estatus-queue code is currently hidden by the NOTIFY_NMI #ifdefs.
Once NOTIFY_SEA starts using the estatus-queue we can stop hiding
it as each architecture has a user that can't be turned off.

Split the existing CONFIG_HAVE_ACPI_APEI_NMI block in two, and move
the SEA code into the gap.

Move the code around ... and changes the stale comment describing
why the status queue is necessary: printk() is no longer the issue,
its the helpers like memory_failure_queue() that aren't nmi safe.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 06ddeadc 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors

During ghes_proc() we use ghes_ack_error() to tell an external agent
we are done with these records and it can re-use the memory.

rc may hold an error returned by ghes_read_estatus(), ENOENT causes
us to skip ghes_ack_error() (as there is nothing to ack), but rc may
also by EIO, which gets supressed.

ghes_clear_estatus() is where we mark the records as processed for
non GHESv2 error sources, and already spots the ENOENT case as
buf_paddr is set to 0 by ghes_read_estatus().

Move the ghes_ack_error() call in here to avoid extra logic with
the return code in ghes_proc().

This enables GHESv2 acking for NMI-like error sources. This is safe
as the buffer is pre-mapped by map_gen_v2() before the GHES is added
to any NMI handler lists.

This same pre-mapping step means we can't receive an error from
apei_read()/write() here as apei_check_gar() succeeded when it
was mapped, and the mapping was cached, so the address can't be
rejected at runtime. Remove the error-returns as this is now
called from a function with no return.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ee2eb3d4 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Generalise the estatus queue's notify code

Refactor the estatus queue's pool notification routine from
NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

Add rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

in_nmi_queue_one_entry() is separate from the rcu-list walker for a
later caller that doesn't need to walk a list.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
[ rjw: Drop unnecessary err variable in two places ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 5cc6c682 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Don't update struct ghes' flags in read/clear estatus

ghes_read_estatus() sets a flag in struct ghes if the buffer of
CPER records needs to be cleared once the records have been
processed. This flag value is a problem if a struct ghes can be
processed concurrently, as happens at probe time if an NMI arrives
for the same error source. The NMI clears the flag, meaning the
interrupted handler may never do the ghes_estatus_clear() work.

The GHES_TO_CLEAR flags is only set at the same time as
buffer_paddr, which is now owned by the caller and passed to
ghes_clear_estatus(). Use this value as the flag.

A non-zero buf_paddr returned by ghes_read_estatus() means
ghes_clear_estatus() should clear this address. ghes_read_estatus()
already checks for a read of error_status_address being zero,
so CPER records cannot be written here.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 7d49f2c7 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Remove spurious GHES_TO_CLEAR check

ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going
on to __process_error(). This is pointless as ghes_read_estatus()
will always set this flag if it returns success, which was checked
earlier in the loop. Remove it.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# eeb25557 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Don't store CPER records physical address in struct ghes

When CPER records are found the address of the records is stashed
in the struct ghes. Once the records have been processed, this
address is overwritten with zero so that it won't be processed
again without being re-populated by firmware.

This goes wrong if a struct ghes can be processed concurrently,
as can happen at probe time when an NMI occurs. If the NMI arrives
on another CPU, the probing CPU may call ghes_clear_estatus() on the
records before the handler had finished with them.
Even on the same CPU, once the interrupted handler is resumed, it
will call ghes_clear_estatus() on the NMIs records, this memory may
have already been re-used by firmware.

Avoid this stashing by letting the caller hold the address. A
later patch will do away with the use of ghes->flags in the
read/clear code too.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# fb7be08f 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Make estatus pool allocation a static size

Adding new NMI-like notifications duplicates the calls that grow
and shrink the estatus pool. This is all pretty pointless, as the
size is capped to 64K. Allocate this for each ghes and drop
the code that grows and shrinks the pool.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e147133a 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Make hest.c manage the estatus memory pool

ghes.c has a memory pool it uses for the estatus cache and the estatus
queue. The cache is initialised when registering the platform driver.
For the queue, an NMI-like notification has to grow/shrink the pool
as it is registered and unregistered.

This is all pretty noisy when adding new NMI-like notifications, it
would be better to replace this with a static pool size based on the
number of users.

As a precursor, move the call that creates the pool from ghes_init(),
into hest.c. Later this will take the number of ghes entries and
consolidate the queue allocations.
Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put
this.

The pool is now initialised as part of ACPI's subsys_initcall():
(acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init())
Before this patch it happened later as a GHES specific device_initcall().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 0ac234be 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Switch estatus pool to use vmalloc memory

The ghes code is careful to parse and round firmware's advertised
memory requirements for CPER records, up to a maximum of 64K.
However when ghes_estatus_pool_expand() does its work, it splits
the requested size into PAGE_SIZE granules.

This means if firmware generates 5K of CPER records, and correctly
describes this in the table, __process_error() will silently fail as it
is unable to allocate more than PAGE_SIZE.

Switch the estatus pool to vmalloc() memory. On x86 vmalloc() memory
may fault and be fixed up by vmalloc_fault(). To prevent this call
vmalloc_sync_all() before an NMI handler could discover the memory.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 93066e9a 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Remove silent flag from ghes_read_estatus()

Subsequent patches will split up ghes_read_estatus(), at which
point passing around the 'silent' flag gets annoying. This is to
suppress prink() messages, which prior to commit 42a0bb3f7138
("printk/nmi: generic solution for safe printk in NMI"), were
unsafe in NMI context.

This is no longer necessary, remove the flag. printk() messages
are batched in a per-cpu buffer and printed via irq-work, or a call
back from panic().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 78b0b690 29-Jan-2019 James Morse <james.morse@arm.com>

ACPI / APEI: Don't wait to serialise with oops messages when panic()ing

oops_begin() exists to group printk() messages with the oops message
printed by die(). To reach this caller we know that platform firmware
took this error first, then notified the OS via NMI with a 'panic'
severity.

Don't wait for another CPU to release the die-lock before panic()ing,
our only goal is to print this fatal error and panic().

This code is always called in_nmi(), and since commit 42a0bb3f7138
("printk/nmi: generic solution for safe printk in NMI"), it has been
safe to call printk() from this context. Messages are batched in a
per-cpu buffer and printed via irq-work, or a call back from panic().

Link: https://patchwork.kernel.org/patch/10313555/
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 98cff8b2 19-Dec-2018 Lenny Szubowicz <lszubowi@redhat.com>

ACPI/APEI: Clear GHES block_status before panic()

In __ghes_panic() clear the block status in the APEI generic
error status block for that generic hardware error source before
calling panic() to prevent a second panic() in the crash kernel
for exactly the same fatal error.

Otherwise ghes_probe(), running in the crash kernel, would see
an unhandled error in the APEI generic error status block and
panic again, thereby precluding any crash dump.

Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Signed-off-by: David Arcari <darcari@redhat.com>
Tested-by: Tyler Baicar <baicar.tyler@gmail.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 305d0e00 30-Apr-2018 Alexandru Gagniuc <mr.nuke.me@gmail.com>

EDAC, ghes: Remove unused argument to ghes_edac_report_mem_error()

The use of the @ghes argument was removed in a previous commit, but
function signature was not updated to reflect this.

Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-acpi@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180430213358.8319-1-mr.nuke.me@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>


# cc7f3f13 23-Apr-2018 Borislav Petkov <bp@suse.de>

ghes, EDAC: Fix ghes_edac registration

Tony reported seeing

"Internal error: Can't find EDAC structure"

when injecting correctable errors due to the fact that ghes_edac would
still load even if the whitelist won't hit. Drop the pr_err() in
ghes_edac_report_mem_error() for now due to the hacky way how ghes_edac
depends on ghes.c.

While at it, make ghes_edac_register() return an error if it doesn't hit
in the whitelist as it is the only sensible thing to do in that
situation.

Furthermore, move the call to it to happen last in ghes_probe() so that
GHES initializing properly does not depend on ghes_edac init at all
as latter is only reporting errors and not required for GHES's proper
functioning.

Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Tested-by: Sughosh Ganu <sughosh.ganu@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20180420182015.zao3olss4tvvlxki@agluck-desk


# 83b57531 09-Jul-2017 Eric W. Biederman <ebiederm@xmission.com>

mm/memory_failure: Remove unused trapno from memory_failure

Today 4 architectures set ARCH_SUPPORTS_MEMORY_FAILURE (arm64, parisc,
powerpc, and x86), while 4 other architectures set __ARCH_SI_TRAPNO
(alpha, metag, sparc, and tile). These two sets of architectures do
not interesect so remove the trapno paramater to remove confusion.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>


# 24bc8f03 15-Oct-2017 Colin Ian King <colin.king@canonical.com>

ACPI / APEI: remove redundant variables len and node_len

Variables len and node_len are redundant and can be removed. Cleans
up clang warning: node_len = GHES_ESTATUS_NODE_LEN(len);

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 9852ce9a 28-Nov-2017 Tyler Baicar <tbaicar@codeaurora.org>

ACPI: APEI: call into AER handling regardless of severity

Currently the GHES code only calls into the AER driver for
recoverable type errors. This is incorrect because errors of
other severities do not get logged by the AER driver and do not
get exposed to user space via the AER trace event. So, call
into the AER driver for PCIe errors regardless of the severity

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 3c5b977f 28-Nov-2017 Tyler Baicar <tbaicar@codeaurora.org>

ACPI: APEI: handle PCIe AER errors in separate function

Move PCIe AER error handling code into a separate function.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 520e18a5 06-Nov-2017 James Morse <james.morse@arm.com>

ACPI / APEI: Remove ghes_ioremap_area

Now that nothing is using the ghes_ioremap_area pages, rip them out.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>


# 4f89fa28 06-Nov-2017 James Morse <james.morse@arm.com>

ACPI / APEI: Replace ioremap_page_range() with fixmap

Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range()
with __set_fixmap() as ioremap_page_range() may sleep to allocate a new
level of page-table, even if its passed an existing final-address to
use in the mapping.

The GHES driver can only be enabled for architectures that select
HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64.

clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64
and __set_pte_vaddr() for x86. In each case its the same as the
respective arch_apei_flush_tlb_one().

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Tested-by: Toshi Kani <toshi.kani@hpe.com>
[ For the arm64 bits: ]
Acked-by: Will Deacon <will.deacon@arm.com>
[ For the x86 bits: ]
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: All applicable <stable@vger.kernel.org>


# d5272003 12-Oct-2017 Kees Cook <keescook@chromium.org>

ACPI / APEI: Convert timers to use timer_setup()

In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tyler Baicar <tbaicar@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
Cc: Shiju Jose <shiju.jose@huawei.com>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>


# c49870e8 17-Oct-2017 Dongjiu Geng <gengdongjiu@huawei.com>

ACPI / APEI: remove the unused dead-code for SEA/NMI notification type

For the SEA notification, the two functions ghes_sea_add() and
ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA
is defined. If not, it will return errors in the ghes_probe()
and not continue. If the probe is failed, the ghes_sea_remove()
also has no chance to be called. Hence, remove the unnecessary
handling when CONFIG_ACPI_APEI_SEA is not defined.

For the NMI notification, it has the same issue as SEA notification,
so also remove the unused dead-code for it.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 095f613c 25-Sep-2017 Jan Beulich <JBeulich@suse.com>

ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq()

Match up with what 7edda0886b ("acpi: apei: handle SEA notification
type for ARMv8") did for ghes_ioremap_pfn_nmi().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# aaf2c2fb 28-Aug-2017 Tyler Baicar <tbaicar@codeaurora.org>

ACPI / APEI: clear error status before acknowledging the error

Currently we acknowledge errors before clearing the error status.
This could cause a new error to be populated by firmware in-between
the error acknowledgment and the error status clearing which would
cause the second error's status to be cleared without being handled.
So, clear the error status before acknowledging the errors.

Also, make sure to acknowledge the error if the error status read
fails.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e931d0da 29-Aug-2017 Punit Agrawal <punitagrawal@gmail.com>

ACPI / APEI: Suppress message if HEST not present

According to the ACPI specification, firmware is not required to provide
the Hardware Error Source Table (HEST). When HEST is not present, the
following superfluous message is printed to the kernel boot log -

[ 3.460067] GHES: HEST is not enabled!

Extend hest_disable variable to track whether the firmware provides this
table and if it is not present skip any log output. The existing
behaviour is preserved in all other cases.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# bdb9458a 21-Jul-2017 Loc Ho <lho@apm.com>

ACPI: APEI: Enable APEI multiple GHES source to share a single external IRQ

X-Gene platforms describe multiple GHES error sources with the same
hardware error notification type (external interrupt) and interrupt
number.

Change the GHES interrupt request to support sharing the same IRQ.

This change includs contributions from Tuan Phan <tphan@apm.com>.

Signed-off-by: Loc Ho <lho@apm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 77b246b3 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

acpi: apei: check for pending errors when probing GHES entries

Check for pending errors when probing GHES entries. It is possible
that a fatal error is already pending at this point, so we should
handle it as soon as the driver is probed. This also avoids a
potential issue if there was an interrupt that was already
cleared for an error since the GHES driver wasn't present.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 621f48e4 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

arm/arm64: KVM: add guest SEA support

Currently external aborts are unsupported by the guest abort
handling. Add handling for SEAs so that the host kernel reports
SEAs which occur in the guest kernel.

When an SEA occurs in the guest kernel, the guest exits and is
routed to kvm_handle_guest_abort(). Prior to this patch, a print
message of an unsupported FSC would be printed and nothing else
would happen. With this patch, the code gets routed to the APEI
handling of SEAs in the host kernel to report the SEA information.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# e9279e83 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

trace, ras: add ARM processor error trace event

Currently there are trace events for the various RAS
errors with the exception of ARM processor type errors.
Add a new trace event for such errors so that the user
will know when they occur. These trace events are
consistent with the ARM processor error section type
defined in UEFI 2.6 spec section N.2.4.4.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 297b64c7 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

ras: acpi / apei: generate trace event for unrecognized CPER section

The UEFI spec includes non-standard section type support in the
Common Platform Error Record. This is defined in section N.2.3 of
UEFI version 2.5.

Currently if the CPER section's type (UUID) does not match any
section type that the kernel knows how to parse, a trace event is
not generated.

Generate a trace event which contains the raw error data for
non-standard section type error records.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Tested-by: Shiju Jose <shiju.jose@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 2fb5853e 21-Jun-2017 Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>

acpi: apei: panic OS with fatal error status block

Even if an error status block's severity is fatal, the kernel does not
honor the severity level and panic.

With the firmware first model, the platform could inform the OS about a
fatal hardware error through the non-NMI GHES notification type. The OS
should panic when a hardware error record is received with this
severity.

Call panic() after CPER data in error status block is printed if
severity is fatal, before each error section is handled.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 7edda088 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

acpi: apei: handle SEA notification type for ARMv8

ARM APEI extension proposal added SEA (Synchronous External Abort)
notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.
An SEA can interrupt code that had interrupts masked and is treated as
an NMI. To aid this the page of address space for mapping APEI buffers
while in_nmi() is always reserved, and ghes_ioremap_pfn_nmi() is
changed to use the helper methods to find the prot_t to map with in
the same way as ghes_ioremap_pfn_irq().

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# bbcc2e7b 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

ras: acpi/apei: cper: add support for generic data v3 structure

The ACPI 6.1 spec adds a new revision of the generic error data
entry structure. Add support to handle the new structure as well
as properly verify and iterate through the generic data entries.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 42aa5604 21-Jun-2017 Tyler Baicar <tbaicar@codeaurora.org>

acpi: apei: read ack upon ghes record consumption

A RAS (Reliability, Availability, Serviceability) controller
may be a separate processor running in parallel with OS
execution, and may generate error records for consumption by
the OS. If the RAS controller produces multiple error records,
then they may be overwritten before the OS has consumed them.

The Generic Hardware Error Source (GHES) v2 structure
introduces the capability for the OS to acknowledge the
consumption of the error record generated by the RAS
controller. A RAS controller supporting GHESv2 shall wait for
the acknowledgment before writing a new error record, thus
eliminating the race condition.

Add support for parsing of GHESv2 sub-tables as well.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>


# 5b53696a 05-Jun-2017 Andy Shevchenko <andriy.shevchenko@linux.intel.com>

ACPI / APEI: Switch to use new generic UUID API

There are new types and helpers that are supposed to be used in new code.

As a preparation to get rid of legacy types and API functions do
the conversion here.

Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>


# 7bf130e4 19-May-2017 Shiju Jose <shiju.jose@huawei.com>

ACPI/APEI: Handle GSIV and GPIO notification types

System Controller Interrupts are received by ACPI's error device, which
in turn notifies the GHES code. The same is true of APEI's GSIV and
GPIO notification types. Add support for GSIV and GPIO sharing the SCI
register/unregister/notifier code. Rename the list and notifier to show
this is no longer just SCI, but anything from the Hardware Error Device.

Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
[ Rewrite commit log. ]
Signed-off-by: James Morse <james.morse@arm.com>
[ Some small cleanups ontop. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: James Morse <james.morse@arm.com>
Link: http://lkml.kernel.org/r/86258A5CC0A3704780874CF6004BA8A62E695201@FRAEML521-MBX.china.huawei.com
Cc: "Guohanjun (Hanjun Guo)" <guohanjun@huawei.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "Zhengqiang (turing)" <zhengqiang10@huawei.com>
Cc: "fu.wei@linaro.org" <fu.wei@linaro.org>
Cc: "xuwei (O)" <xuwei5@hisilicon.com>
Cc: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7237c75b 18-Apr-2017 Geliang Tang <geliangtang@gmail.com>

ACPI/APEI: Use setup_deferrable_timer()

Use setup_deferrable_timer() instead of init_timer_deferrable() to
simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Link: http://lkml.kernel.org/r/3afa5498142ef68256023257dad37b9f8352e65e.1489060803.git.geliangtang@gmail.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


# 7d64f82c 16-Mar-2017 James Morse <james.morse@arm.com>

ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal

When removing a GHES device notified by SCI, list_del_rcu() is used,
ghes_remove() should call synchronize_rcu() before it goes on to call
kfree(ghes), otherwise concurrent RCU readers may still hold this list
entry after it has been freed.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Fixes: 81e88fdc432a (ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# e6017571 01-Feb-2017 Ingo Molnar <mingo@kernel.org>

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h>

We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
will have to be picked up from other headers and .c files.

Create a trivial placeholder <linux/sched/clock.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# a545715d 30-Nov-2016 Prarit Bhargava <prarit@redhat.com>

ACPI / APEI: Fix NMI notification handling

When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:

WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
Call Trace:
dump_stack+0x63/0x8e
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
setup_local_APIC+0x275/0x370
apic_ap_setup+0xe/0x20
start_secondary+0x48/0x180
set_init_arg+0x55/0x55
early_idt_handler_array+0x120/0x120
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x13d/0x14c

During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0. The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6). The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.

Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.

This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.

Fixes: 2383844d4850 (GHES: Elliminate double-loop in the NMI handler)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 806487a8 18-Oct-2016 Punit Agrawal <punitagrawal@gmail.com>

ACPI / APEI: Fix incorrect return value of ghes_proc()

Although ghes_proc() tests for errors while reading the error status,
it always return success (0). Fix this by propagating the return
value.

Fixes: d334a49113a4a33 (ACPI, APEI, Generic Hardware Error Source memory error support)
Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 2458d66b 14-Sep-2016 Tyler Baicar <tbaicar@codeaurora.org>

ACPI / APEI: Send correct severity to calculate AER severity

Currently the AER severity is calculated by calling cper_severity_to_aer(),
but the parameter sent is actually the GHES severity. This causes the AER
severity to be incorrect.

Fix the parameter to be the CPER severity instead of the GHES severity.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Borislav Petkov <bp@suse.de>


# 020bf066 14-Feb-2016 Paul Gortmaker <paul.gortmaker@windriver.com>

drivers/acpi: make apei/ghes.c more explicitly non-modular

The Kconfig currently controlling compilation of this code is:

config ACPI_APEI_GHES
bool "APEI Generic Hardware Error Source"

...meaning that it currently is not being built as a module by anyone.

Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We replace module.h with moduleparam.h as we are keeping the
pre-existing module_param that the file has, as currently that is
the easiest way to maintain compatibility with the existing boot
arg use cases.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 8ece249a 04-Sep-2015 Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>

acpi/apei: Use appropriate pgprot_t to map GHES memory

If the ACPI APEI firmware handles hardware error first (called
"firmware first handling"), the firmware updates the GHES memory
region with hardware error record (called "generic hardware
error record"). Essentially the firmware writes hardware error
records in the GHES memory region, triggers an NMI/interrupt,
then the GHES driver goes off and grabs the error record from
the GHES region.

The kernel currently maps the GHES memory region as cacheable
(PAGE_KERNEL) for all architectures. However, on some arm64
platforms, there is a mismatch between how the kernel maps the
GHES region (PAGE_KERNEL) and how the firmware maps it
(EFI_MEMORY_UC, ie. uncacheable), leading to the possibility of
the kernel GHES driver reading stale data from the cache when it
receives the interrupt.

With stale data being read, the kernel is unaware there is new
hardware error to be handled when there actually is; this may
lead to further damage in various scenarios, such as error
propagation caused data corruption. If uncorrected error (such
as double bit ECC error) happened in memory operation and if the
kernel is unaware of such an event happening, errorneous data may
be propagated to the disk.

Instead GHES memory region should be mapped with page protection
type according to what is returned from arch_apei_get_mem_attribute().

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
[ Small stylistic tweaks. ]
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441372302-23242-3-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 4c62dbbc 26-Jun-2015 Jarkko Nikula <jarkko.nikula@linux.intel.com>

ACPI: Remove FSF mailing addresses

There is no need to carry potentially outdated Free Software Foundation
mailing address in file headers since the COPYING file includes it.

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 6fe9e7c2 27-Mar-2015 Jiri Kosina <jkosina@suse.cz>

GHES: Make NMI handler have a single reader

Since GHES sources are global, we theoretically need only a single CPU
reading them per NMI instead of a thundering herd of CPUs waiting on a
spinlock in NMI context for no reason at all.

Do that.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>


# 2383844d 18-Mar-2015 Borislav Petkov <bp@suse.de>

GHES: Elliminate double-loop in the NMI handler

There's no real need to iterate twice over the HW error sources in the
NMI handler. With the previous cleanups, elliminating the second loop is
almost trivial.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 6169ddf8 18-Mar-2015 Borislav Petkov <bp@suse.de>

GHES: Panic right after detection

The moment we log an error of panic severity, there's no need to noodle
through the ghes_nmi list anymore. So panic instead right then and
there.

Signed-off-by: Borislav Petkov <bp@suse.de>


# e10be03f 18-Mar-2015 Borislav Petkov <bp@suse.de>

GHES: Carve out the panic functionality

... into another function for more clarity. No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 11568496 18-Mar-2015 Borislav Petkov <bp@suse.de>

GHES: Carve out error queueing in a separate function

Make the handler more readable.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 8f7c31f6 29-Sep-2014 Borislav Petkov <bp@suse.de>

GHES: Make ghes_estatus_caches static

It is used only in ghes.c.

Signed-off-by: Borislav Petkov <bp@suse.de>


# 8d21d4c9 28-Jul-2014 Chen, Gong <gong.chen@linux.intel.com>

APEI, GHES: Cleanup unnecessary function for lockless list

We have a generic function to reverse a lockless list, kill homegrown
copy.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1406530260-26078-2-git-send-email-gong.chen@linux.intel.com
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
[ Boris: correct commit msg ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# e61bf8d0 20-Oct-2014 Wolfram Sang <wsa@kernel.org>

acpi: apei: drop owner assignment from platform_drivers

A platform_driver does not need to set an owner, it will be populated by the
driver core.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>


# 594c7255 22-Jul-2014 Tomasz Nowicki <tomasz.nowicki@linaro.org>

acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context.

GHES currently maps two pages with atomic_ioremap. From now
on, NMI is architectural depended so there is no need to allocate
an NMI page for platforms without NMI support.

To make it possible to not use a second page, swap the existing
page order so that the IRQ context page is first, and the optional
NMI context page is second. Then, use HAVE_ACPI_APEI_NMI to decide
how many pages are to be allocated.

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 44a69f61 22-Jul-2014 Tomasz Nowicki <tomasz.nowicki@linaro.org>

acpi, apei, ghes: Make NMI error notification to be GHES architecture extension.

Currently APEI depends on x86 architecture. It is because of NMI hardware
error notification of GHES which is currently supported by x86 only.
However, many other APEI features can be still used perfectly by other
architectures.

This commit adds two symbols:
1. HAVE_ACPI_APEI for those archs which support APEI.
2. HAVE_ACPI_APEI_NMI which is used for NMI code isolation in ghes.c
file. NMI related data and functions are grouped so they can be wrapped
inside one #ifdef section. Appropriate function stubs are provided for
!NMI case.

Note there is no functional changes for x86 due to hard selected
HAVE_ACPI_APEI and HAVE_ACPI_APEI_NMI symbols.

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 9dae3d0d 22-Jul-2014 Tomasz Nowicki <tomasz.nowicki@linaro.org>

apei, mce: Factor out APEI architecture specific MCE calls.

This commit abstracts MCE calls and provides weak corresponding default
implementation for those architectures which do not need arch specific
actions. Each platform willing to do additional architectural actions
should provides desired function definition. It allows us to avoid wrap
code into #ifdef in generic code and prevent new platform from introducing
dummy stub function too.

Initially, there are two APEI arch-specific calls:
- arch_apei_enable_cmcff()
- arch_apei_report_mem_error()
Both interact with MCE driver for X86 architecture.

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 0a00fd5e 03-Jun-2014 Lv Zheng <lv.zheng@intel.com>

ACPICA: Restore error table definitions to reduce code differences between Linux and ACPICA upstream.

The following commit has changed ACPICA table header definitions:

Commit: 88f074f4871a8c212b212b725e4dcdcdb09613c1
Subject: ACPI, CPER: Update cper info

While such definitions are currently maintained in ACPICA. As the
modifications applying to the table definitions affect other OSPMs'
drivers, it is very difficult for ACPICA to initiate a process to
complete the merge. Thus this commit finally only leaves us divergences.

Revert such naming modifications to reduce the source code differecnes
between Linux and ACPICA upstream. No functional changes.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Chen, Gong <gong.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# ca104edc 25-Nov-2013 Chen, Gong <gong.chen@linux.intel.com>

ACPI, APEI, GHES: Cleanup ghes memory error handling

Cleanup the logic in ghes_handle_memory_failure(). While at it, add
proper PFN validity check for UC error and cleanup the code logic to
make it simpler and cleaner.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1385363701-12387-2-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# addccbb2 25-Nov-2013 Chen, Gong <gong.chen@linux.intel.com>

ACPI, APEI, GHES: Do not report only correctable errors with SCI

Currently SCI is employed to handle corrected errors - memory corrected
errors, more specifically but in fact SCI still can be used to handle
any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS.
Enable logging for those kinds of errors too.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com
[ Boris: massage commit message, rename function arg. ]
Signed-off-by: Borislav Petkov <bp@suse.de>


# 27d50c82 06-Dec-2013 Lv Zheng <lv.zheng@intel.com>

ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h>

To avoid build problems and breaking dependencies between ACPI header
files, <acpi/acpi.h> should not be included directly by code outside
of the ACPI core subsystem. However, that is possible if
<linux/acpi_io.h> is included, because that file contains
a direct inclusion of <acpi/acpi.h>.

For this reason, remove the direct <acpi/acpi.h> inclusion from
<linux/acpi_io.h>, move that file from include/linux/ to include/acpi/
and make <linux/acpi.h> include it for CONFIG_ACPI set along with the
other ACPI header files. Accordingly, Remove the inclusions of
<linux/acpi_io.h> from everywhere.

Of course, that causes the contents of the new <acpi/acpi_io.h> file
to be available for CONFIG_ACPI set only, so intel_opregion.o that
depends on it should also depend on CONFIG_ACPI (and it really should
not be compiled for CONFIG_ACPI unset anyway).

References: https://01.org/linuxgraphics/sites/default/files/documentation/acpi_igd_opregion_spec.pdf
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[rjw: Subject and changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 147de147 18-Oct-2013 Chen, Gong <gong.chen@linux.intel.com>

ACPI, APEI, CPER: Add UEFI 2.4 support for memory error

In latest UEFI spec(by now it is 2.4) memory error definition
for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
adds some new fields. These fields help people to locate
memory error to an actual DIMM location.

Original-author: Tony Luck <tony.luck@intel.com>
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 88f074f4 18-Oct-2013 Chen, Gong <gong.chen@linux.intel.com>

ACPI, CPER: Update cper info

We have a lot of confusing names of functions and data structures in
amongs the the error reporting code. In particular the "apei" prefix
has been applied to many objects that are not part of APEI. Since we
will be using these routines for extended error log reporting it will
be clearer if we fix up the names first.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# cf870c70 10-Jul-2013 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

mce: acpi/apei: Soft-offline a page on firmware GHES notification

If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 0ba98ec9 06-Jun-2013 Betty Dall <betty.dall@hp.com>

ACPI / APEI: Force fatal AER severity when component has been reset

The CPER error record has a reset bit that indicates that the platform
has reset the component. The reset bit can be set for any severity
error including recoverable. From the AER code path's perspective,
any error is fatal if the component has been reset. This patch
upgrades the severity of the AER recovery to AER_FATAL whenever the
CPER error record indicates that the component has been reset.

[bhelgaas: s/bus has been reset/component has been reset/]
Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


# a98d4f64 02-Jun-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn>

ACPI / APEI: fix error return code in ghes_probe()

Fix to return a negative error code in the acpi_gsi_to_irq() and
request_irq() error handling case instead of 0, as done elsewhere
in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 37448adf 30-May-2013 Lance Ortiz <lance.ortiz@hp.com>

aerdrv: Move cper_print_aer() call out of interrupt context

The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer(). The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>


# 21480547 15-Feb-2013 Mauro Carvalho Chehab <mchehab@kernel.org>

ghes: add the needed hooks for EDAC error report

In order to allow reporting errors via EDAC, add hooks for:

1) register an EDAC driver;
2) unregister an EDAC driver;
3) report errors via EDAC.

As the EDAC driver will need to access the ghes structure, adds it
as one of the parameters for ghes_do_proc.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# 40e06415 15-Feb-2013 Mauro Carvalho Chehab <mchehab@kernel.org>

ghes: move structures/enum to a header file

As a ghes_edac driver will need to access ghes structures, in order
to properly handle the errors, move those structures to a separate
header file. No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>


# da095fd3 19-Nov-2012 Bill Pemberton <wfp5p@virginia.edu>

acpi: remove use of __devinit

CONFIG_HOTPLUG is going away as an option so __devinit is no longer
needed.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


# b59bc2fb 21-Nov-2012 Bill Pemberton <wfp5p@virginia.edu>

ACPI: remove use of __devexit

CONFIG_HOTPLUG is going away as an option so __devexit is no
longer needed.

Signed-off-by: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


# 34ddeb03 11-Jun-2012 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Avoid too much error reporting in runtime

This patch fixed the following bug.

https://bugzilla.kernel.org/show_bug.cgi?id=43282

This is caused by a firmware bug checking (checking generic address
register provided by firmware) in runtime. The checking should be
done in address mapping time instead of runtime to avoid too much
error reporting in runtime.

Reported-by: Pawel Sikora <pluto@agmk.net>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Tested-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>


# 700130b4 07-Nov-2011 Myron Stowe <mstowe@redhat.com>

ACPI APEI: Convert atomicio routines

APEI needs memory access in interrupt context. The obvious choice is
acpi_read(), but originally it couldn't be used in interrupt context
because it makes temporary mappings with ioremap(). Therefore, we added
drivers/acpi/atomicio.c, which provides:
acpi_pre_map_gar() -- ioremap in process context
acpi_atomic_read() -- memory access in interrupt context
acpi_post_unmap_gar() -- iounmap

Later we added acpi_os_map_generic_address() (2971852) and enhanced
acpi_read() so it works in interrupt context as long as the address has
been previously mapped (620242a). Now this sequence:
acpi_os_map_generic_address() -- ioremap in process context
acpi_read()/apei_read() -- now OK in interrupt context
acpi_os_unmap_generic_address()
is equivalent to what atomicio.c provides.

This patch introduces apei_read() and apei_write(), which currently are
functional equivalents of acpi_read() and acpi_write(). This is mainly
proactive, to prevent APEI breakages if acpi_read() and acpi_write()
are ever augmented to support the 'bit_offset' field of GAS, as APEI's
__apei_exec_write_register() precludes splitting up functionality
related to 'bit_offset' and APEI's 'mask' (see its
APEI_EXEC_PRESERVE_REGISTER block).

With apei_read() and apei_write() in place, usages of atomicio routines
are converted to apei_read()/apei_write() and existing calls within
osl.c and the CA, based on the re-factoring that was done in an earlier
patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2:
acpi_pre_map_gar() --> acpi_os_map_generic_address()
acpi_post_unmap_gar() --> acpi_os_unmap_generic_address()
acpi_atomic_read() --> apei_read()
acpi_atomic_write() --> apei_write()

Note that acpi_read() and acpi_write() currently use 'bit_width'
for accessing GARs which seems incorrect. 'bit_width' is the size of
the register, while 'access_width' is the size of the access the
processor must generate on the bus. The 'access_width' may be larger,
for example, if the hardware only supports 32-bit or 64-bit reads. I
wanted to minimize any possible impacts with this patch series so I
did *not* change this behavior.

Signed-off-by: Myron Stowe <myron.stowe@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 46d12f0b 07-Dec-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Printk queued error record before panic

Because printk is not safe inside NMI handler, the recoverable error
records received in NMI handler will be queued to be printked in a
delayed IRQ context via irq_work. If a fatal error occurs after the
recoverable error and before the irq_work processed, we lost a error
report.

To solve the issue, the queued error records are printked in NMI
handler if system will go panic.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5ba82ab5 07-Dec-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES, Distinguish interleaved error report in kernel log

In most cases, printk only guarantees messages from different printk
calling will not be interleaved between each other. But, one APEI
GHES hardware error report will involve multiple printk calling,
normally each for one line. So it is possible that the hardware error
report comes from different generic hardware error source will be
interleaved.

In this patch, a sequence number is prefixed to each line of error
report. So that, even if they are interleaved, they still can be
distinguished by the prefixed sequence number.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# a654e5ee 07-Dec-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES: Add PCIe AER recovery support

aer_recover_queue() is called when recoverable PCIe AER errors are
notified by firmware to do the recovery work.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 90ab5ee9 12-Jan-2012 Rusty Russell <rusty@rustcorp.com.au>

module_param: make bool parameters really bool (drivers & misc)

module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


# 9c48f1c6 30-Sep-2011 Don Zickus <dzickus@redhat.com>

x86, nmi: Wire up NMI handlers to new routines

Just convert all the files that have an nmi handler to the new routines.
Most of it is straight forward conversion. A couple of places needed some
tweaking like kgdb which separates the debug notifier from the nmi handler
and mce removes a call to notify_die.

[Thanks to Ying for finding out the history behind that mce call

https://lkml.org/lkml/2010/5/27/114

And Boris responding that he would like to remove that call because of it

https://lkml.org/lkml/2011/9/21/163]

The things that get converted are the registeration/unregistration routines
and the nmi handler itself has its args changed along with code removal
to check which list it is on (most are on one NMI list except for kgdb
which has both an NMI routine and an NMI Unknown routine).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Corey Minyard <minyard@acm.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Jack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>


# 70cb6e1d 02-Aug-2011 Len Brown <len.brown@intel.com>

APEI GHES: 32-bit buildfix

drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression

ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
  in function ghes_estatus_cache_add().

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Len Brown <len.brown@intel.com>


# ba61ca4a 12-Jul-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES: Add hardware memory error recovery support

memory_failure_queue() is called when recoverable memory errors are
notified by firmware to do the recovery work.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 152cef40 12-Jul-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES, Error records content based throttle

printk is used by GHES to report hardware errors. Ratelimit is
enforced on the printk to avoid too many hardware error reports in
kernel log. Because there may be thousands or even millions of
corrected hardware errors during system running.

Currently, a simple scheme is used. That is, the total number of
hardware error reporting is ratelimited. This may cause some issues
in practice.

For example, there are two kinds of hardware errors occurred in
system. One is corrected memory error, because the fault memory
address is accessed frequently, there may be hundreds error report
per-second. The other is corrected PCIe AER error, it will be
reported once per-second. Because they share one ratelimit control
structure, it is highly possible that only memory error is reported.

To avoid the above issue, an error record content based throttle
algorithm is implemented in the patch. Where after the first
successful reporting, all error records that are same are throttled for
some time, to let other kinds of error records have the opportunity to
be reported.

In above example, the memory errors will be throttled for some time,
after being printked. Then the PCIe AER error will be printked
successfully.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 67eb2e99 12-Jul-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES, printk support for recoverable error via NMI

Some APEI GHES recoverable errors are reported via NMI, but printk is
not safe in NMI context.

To solve the issue, a lock-less memory allocator is used to allocate
memory in NMI handler, save the error record into the allocated
memory, put the error record into a lock-less list. On the other
hand, an irq_work is used to delay the operation from NMI context to
IRQ context. The irq_work IRQ handler will remove nodes from
lock-less list, printk the error record and do some further processing
include recovery operation, then free the memory.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 9fb0bfe1 12-Jul-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Add WHEA _OSC support

APEI firmware first mode must be turned on explicitly on some
machines, otherwise there may be no GHES hardware error record for
hardware error notification. APEI bit in generic _OSC call can be
used to do that, but on some machine, a special WHEA _OSC call must be
used. This patch adds the support to that WHEA _OSC call.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# b6a95016 12-Jul-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES, Support disable GHES at boot time

Some machine may have broken firmware so that GHES and firmware first
mode should be disabled. This patch adds support to that.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 5588340d 12-Jul-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic

printk is used by GHES to report hardware errors. Normally, the
printk will be ratelimited to avoid too many hardware error reports in
kernel log. Because there may be thousands or even millions of
corrected hardware errors during system running.

That is different for fatal hardware error, because system will go
panic as soon as possible, there will be no more than several error
records. And these error records are valuable for system fault
diagnosis, so they should not be ratelimited.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 25985edc 30-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi>

Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>


# 81e88fdc 11-Jan-2011 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

This patch adds POLL/IRQ/NMI notification types support.

Because the memory area used to transfer hardware error information
from BIOS to Linux can be determined only in NMI, IRQ or timer
handler, but general ioremap can not be used in atomic context, so a
special version of atomic ioremap is implemented for that.

Known issue:

- Error information can not be printed for recoverable errors notified
via NMI, because printk is not NMI-safe. Will fix this via delay
printing to IRQ context via irq_work or make printk NMI-safe.

v2:

- adjust printk format per comments.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 32c361f5 06-Dec-2010 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Report GHES error information via printk

printk is one of the methods to report hardware errors to user space.
This patch implements hardware error reporting for GHES via printk.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 1dd6b20e 29-Sep-2010 Jin Dongming <jin.dongming@np.css.fujitsu.com>

ACPI, APEI, HEST Fix the unsuitable usage of platform_data

platform_data in hest_parse_ghes() is used for saving the address of entry
information of erst_tab. When the device is failed to be added, platform_data
will be freed by platform_device_put(). But the value saved in platform_data
should not be freed here. If it is done, it will make system panic.

So I think platform_data should save the address of allocated memory
which saves entry information of erst_tab.

This patch fixed it and I confirmed it on x86_64 next-tree.

v2:
Transport the pointer of hest_hdr to platform_data using
platform_device_add_data()

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# 7ad6e943 02-Aug-2010 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Manage GHES as platform devices

Register GHES during HEST initialization as platform devices. And make
GHES driver into platform device driver. So that the GHES driver
module can be loaded automatically when there are GHES available.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# ad4ecef2 02-Aug-2010 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Rename CPER and GHES severity constants

The abbreviation of severity should be SEV instead of SER, so the CPER
severity constants are renamed accordingly. GHES severity constants
are renamed in the same way too.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>


# d334a491 18-May-2010 Huang Ying <ying.huang@intel.com>

ACPI, APEI, Generic Hardware Error Source memory error support

Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.

Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.

On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>