Cross Reference: /linux-master/drivers/acpi/apei/ghes.c

History log of /linux-master/drivers/acpi/apei/ghes.c
Revision	Date	Author	Comments
# f2f212f3	20-Nov-2023	Uwe Kleine-König <u.kleine-koenig@pengutronix.de>	ACPI: APEI: GHES: Convert to platform remove callback returning void The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Instead of returning an error code, emit a better error message than the core. Apart from the improved error message this patch has no effects for the driver. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# f3e6b3ae	18-Feb-2024	Dan Williams <dan.j.williams@intel.com>	acpi/ghes: Remove CXL CPER notifications Initial tests with the CXL CPER implementation identified that error reports were being duplicated in the log and the trace event [1]. Then it was discovered that the notification handler took sleeping locks while the GHES event handling runs in spin_lock_irqsave() context [2] While the duplicate reporting was fixed in v6.8-rc4, the fix for the sleeping-lock-vs-atomic collision would enjoy more time to settle and gain some test cycles. Given how late it is in the development cycle, remove the CXL hookup for now and try again during the next merge window. Note that end result is that v6.8 does not emit CXL CPER payloads to the kernel log, but this is in line with the CXL trend to move error reporting to trace events instead of the kernel log. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: http://lore.kernel.org/r/20240108165855.00002f5a@Huawei.com [1] Closes: http://lore.kernel.org/r/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com>
# 54ce1927	31-Jan-2024	Ira Weiny <ira.weiny@intel.com>	cxl/cper: Fix errant CPER prints for CXL events Jonathan reports that CXL CPER events dump an extra generic error message. {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 {1}[Hardware Error]: event severity: recoverable {1}[Hardware Error]: Error 0, type: recoverable {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6 {1}[Hardware Error]: section length: 0x90 {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................ {1}[Hardware Error]: 00000010: 00100000 00000000 00040000 00000000 ................ ... CXL events were rerouted though the CXL subsystem for additional processing. However, when that work was done it was missed that cper_estatus_print_section() continued with a generic error message which is confusing. Teach CPER print code to ignore printing details of some section types. Assign the CXL event GUIDs to this set to prevent confusing unknown prints. Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
# 671a794c	20-Dec-2023	Ira Weiny <ira.weiny@intel.com>	acpi/ghes: Process CXL Component Events BIOS can configure memory devices as firmware first. This will send CXL events to the firmware instead of the OS. The firmware can then send these events to the OS via UEFI. UEFI v2.10 section N.2.14 defines a Common Platform Error Record (CPER) format for CXL Component Events. The format is mostly the same as the CXL Common Event Record Format. The difference is the use of a GUID in the Section Type rather than a UUID as part of the event itself. Add GHES support to detect CXL CPER records and call a registered callback with the event. A notifier chain was considered for the callback but the complexity did not justify the use case as only the CXL subsystem requires this event. Enforce that only one callback can be registered at any time. Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-7-1bb8a4ca2c7a@intel.com [djbw: fixup checkpatch errors] Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
# a70297d2	17-Dec-2023	Shuai Xue <xueshuai@linux.alibaba.com>	ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events There are two major types of uncorrected recoverable (UCR) errors : - Synchronous error: The error is detected and raised at the point of the consumption in the execution flow, e.g. when a CPU tries to access a poisoned cache line. The CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64 and Machine Check Exception (MCE) on X86. OS requires to take action (for example, offline failure page/kill failure thread) to recover this uncorrectable error. - Asynchronous error: The error is detected out of processor execution context, e.g. when an error is detected by a background scrubber. Some data in the memory are corrupted. But the data have not been consumed. OS is optional to take action to recover this uncorrectable error. When APEI firmware first is enabled, a platform may describe one error source for the handling of synchronous errors (e.g. MCE or SEA notification ), or for handling asynchronous errors (e.g. SCI or External Interrupt notification). In other words, we can distinguish synchronous errors by APEI notification. For synchronous errors, kernel will kill the current process which accessing the poisoned page by sending SIGBUS with BUS_MCEERR_AR. In addition, for asynchronous errors, kernel will notify the process who owns the poisoned page by sending SIGBUS with BUS_MCEERR_AO in early kill mode. However, the GHES driver always sets mf_flags to 0 so that all synchronous errors are handled as asynchronous errors in memory failure. To this end, set memory failure flags as MF_ACTION_REQUIRED on synchronous events. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# e2abc47a	20-Sep-2023	Shiju Jose <shiju.jose@huawei.com>	ACPI: APEI: Fix AER info corruption when error status data has multiple sections ghes_handle_aer() passes AER data to the PCI core for logging and recovery by calling aer_recover_queue() with a pointer to struct aer_capability_regs. The problem was that aer_recover_queue() queues the pointer directly without copying the aer_capability_regs data. The pointer was to the ghes->estatus buffer, which could be reused before aer_recover_work_func() reads the data. To avoid this problem, allocate a new aer_capability_regs structure from the ghes_estatus_pool, copy the AER data from the ghes->estatus buffer into it, pass a pointer to the new struct to aer_recover_queue(), and free it after aer_recover_work_func() has processed it. Reported-by: Bjorn Helgaas <helgaas@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 9368aa18	19-May-2023	Li Yang <leoyang.li@nxp.com>	APEI: GHES: correctly return NULL for ghes_get_devices() Since 315bada690e0 ("EDAC: Check for GHES preference in the chipset-specific EDAC drivers"), vendor specific EDAC driver will not probe correctly when CONFIG_ACPI_APEI_GHES is enabled but no GHES device is present. Make ghes_get_devices() return NULL when the GHES device list is empty to fix the problem. Fixes: 9057a3f7ac36 ("EDAC/ghes: Prepare to make ghes_edac a proper module") Signed-off-by: Li Yang <leoyang.li@nxp.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 43482701	27-May-2023	Miaohe Lin <linmiaohe@huawei.com>	ACPI: APEI: GHES: Remove unused ghes_estatus_pool_size_request() ghes_estatus_pool_size_request() is unused now, so remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 292a089d	20-Dec-2022	Steven Rostedt (Google) <rostedt@goodmis.org>	treewide: Convert del_timer() to timer_shutdown() Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); \| - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); \| kmem_cache_free(slab, ptr); \| kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# dd3fa54b	24-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() Some documentation first, about how this machinery works: It seems, the intent of the GHES error records cache is to collect already reported errors - see the ghes_estatus_cached() checks. There's even a sentence trying to say what this does: /* * GHES error status reporting throttle, to report more kinds of * errors, instead of just most frequently occurred errors. / New elements are added to the cache this way: if (!ghes_estatus_cached(estatus)) { if (ghes_print_estatus(NULL, ghes->generic, estatus)) ghes_estatus_cache_add(ghes->generic, estatus); The intent being, once this new error record is reported, it gets cached so that it doesn't get reported for a while due to too many, same-type error records getting reported in burst-like scenarios. I.e., new, unreported error types can have a higher chance of getting reported. Now, the loop in ghes_estatus_cache_add() is trying to pick out the oldest element in there. Meaning, something which got reported already but a long while ago, i.e., a LRU-type scheme. And the cmpxchg() is there presumably to make sure when that selected element slot_cache is removed, it really is* that element that gets removed and not one which replaced it in the meantime. Now, ghes_estatus_cache_add() selects a slot, and either succeeds in replacing its contents with a pointer to a newly cached item, or it just gives up and frees the new item again, without attempting to select another slot even if one might be available. Since only inserting new items is being done here, the race can only cause a failure if the selected slot was updated with another new item concurrently, which means that it is arbitrary which of those two items gets dropped. And "dropped" here means, the item doesn't get added to the cache so the next time it is seen, it'll get reported again and an insertion attempt will be done again. Eventually, it'll get inserted and all those times when the insertion fails, the item will get reported although the cache is supposed to prevent that and "ratelimit" those repeated error records. Not a big deal in any case. This means the cmpxchg() and the special case are not necessary. Therefore, just drop the existing item unconditionally. Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock section since there is no actually dereferencing the pointer at all. [ bp: - Flesh out and summarize what was discussed on the thread now that that cache contraption is understood; - Touch up code style. ] Co-developed-by: Jia He <justin.he@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 36006ccb	13-Oct-2022	Uwe Kleine-König <u.kleine-koenig@pengutronix.de>	ACPI: APEI: Drop unsetting driver data on remove Since commit 0998d0631001 ("device-core: Ensure drvdata = NULL when no driver is bound") the driver core cares for cleaning driver data, so don't do it in the driver, too. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 0d2aa70b	18-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg() Some documentation first, about how this machinery works: It seems, the intent of the GHES error records cache is to collect already reported errors - see the ghes_estatus_cached() checks. There's even a sentence trying to say what this does: /* * GHES error status reporting throttle, to report more kinds of * errors, instead of just most frequently occurred errors. / New elements are added to the cache this way: if (!ghes_estatus_cached(estatus)) { if (ghes_print_estatus(NULL, ghes->generic, estatus)) ghes_estatus_cache_add(ghes->generic, estatus); The intent being, once this new error record is reported, it gets cached so that it doesn't get reported for a while due to too many, same-type error records getting reported in burst-like scenarios. I.e., new, unreported error types can have a higher chance of getting reported. Now, the loop in ghes_estatus_cache_add() is trying to pick out the oldest element in there. Meaning, something which got reported already but a long while ago, i.e., a LRU-type scheme. And the cmpxchg() is there presumably to make sure when that selected element slot_cache is removed, it really is* that element that gets removed and not one which replaced it in the meantime. Now, ghes_estatus_cache_add() selects a slot, and either succeeds in replacing its contents with a pointer to a newly cached item, or it just gives up and frees the new item again, without attempting to select another slot even if one might be available. Since only inserting new items is being done here, the race can only cause a failure if the selected slot was updated with another new item concurrently, which means that it is arbitrary which of those two items gets dropped. And "dropped" here means, the item doesn't get added to the cache so the next time it is seen, it'll get reported again and an insertion attempt will be done again. Eventually, it'll get inserted and all those times when the insertion fails, the item will get reported although the cache is supposed to prevent that and "ratelimit" those repeated error records. Not a big deal in any case. This means the cmpxchg() and the special case are not necessary. Therefore, just drop the existing item unconditionally. Move the xchg_release() and call_rcu() out of rcu_read_lock/unlock section since there is no actually dereferencing the pointer at all. [ bp: - Flesh out and summarize what was discussed on the thread now that that cache contraption is understood; - Touch up code style. ] Co-developed-by: Jia He <justin.he@arm.com> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-7-justin.he@arm.com
# 802e7f1d	09-Oct-2022	Jia He <justin.he@arm.com>	EDAC/ghes: Make ghes_edac a proper module Commit dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") introduced a bug leading to ghes_edac_register() to be invoked before edac_init(). Because at that time the bus "edac" hadn't been even registered, this created sysfs nodes as /devices/mc0 instead of /sys/devices/system/edac/mc/mc0 on an Ampere eMag server. Fix this by turning ghes_edac into a proper module. The list of GHES devices returned is not protected from being modified concurrently but it is pretty static as it gets created only during GHES init and latter is not a module so... [ bp: Massage. ] Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()") Co-developed-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-5-justin.he@arm.com
# 9057a3f7	09-Oct-2022	Jia He <justin.he@arm.com>	EDAC/ghes: Prepare to make ghes_edac a proper module To make ghes_edac a proper module, prepare to decouple its dependencies from GHES. Move the ghes_edac.force_load parameter to ghes.c in order to properly control whether ghes_edac should be force-loaded: In ghes_edac_register() it is too late to set the module flag. Introduce a helper ghes_get_devices(), which returns the list of GHES devices which got probed when the platform-check passes on the system. The previous force_load check is not needed in ghes_edac_unregister() since it will be checked in the module's init function of ghes_edac later. [ bp: Massage. ] Suggested-by: Toshi Kani <toshi.kani@hpe.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-4-justin.he@arm.com
# 8e40612f	09-Oct-2022	Jia He <justin.he@arm.com>	EDAC/ghes: Add a notifier for reporting memory errors In order to make it a proper module and disentangle it from facilities, add a notifier for reporting memory errors. Use an atomic notifier because calls sites like ghes_proc_in_irq() run in interrupt context. [ bp: Massage commit message. ] Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Jia He <justin.he@arm.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221010023559.69655-3-justin.he@arm.com
# 43d27483	05-Oct-2022	Ashish Kalra <ashish.kalra@amd.com>	ACPI: APEI: Fix integer overflow in ghes_estatus_pool_init() Change num_ghes from int to unsigned int, preventing an overflow and causing subsequent vmalloc() to fail. The overflow happens in ghes_estatus_pool_init() when calculating len during execution of the statement below as both multiplication operands here are signed int: len += (num_ghes * GHES_ESOURCE_PREALLOC_MAX_SIZE); The following call trace is observed because of this bug: [ 9.317108] swapper/0: vmalloc error: size 18446744071562596352, exceeds total pages, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0-1 [ 9.317131] Call Trace: [ 9.317134] <TASK> [ 9.317137] dump_stack_lvl+0x49/0x5f [ 9.317145] dump_stack+0x10/0x12 [ 9.317146] warn_alloc.cold+0x7b/0xdf [ 9.317150] ? __device_attach+0x16a/0x1b0 [ 9.317155] __vmalloc_node_range+0x702/0x740 [ 9.317160] ? device_add+0x17f/0x920 [ 9.317164] ? dev_set_name+0x53/0x70 [ 9.317166] ? platform_device_add+0xf9/0x240 [ 9.317168] __vmalloc_node+0x49/0x50 [ 9.317170] ? ghes_estatus_pool_init+0x43/0xa0 [ 9.317176] vmalloc+0x21/0x30 [ 9.317177] ghes_estatus_pool_init+0x43/0xa0 [ 9.317179] acpi_hest_init+0x129/0x19c [ 9.317185] acpi_init+0x434/0x4a4 [ 9.317188] ? acpi_sleep_proc_init+0x2a/0x2a [ 9.317190] do_one_initcall+0x48/0x200 [ 9.317195] kernel_init_freeable+0x221/0x284 [ 9.317200] ? rest_init+0xe0/0xe0 [ 9.317204] kernel_init+0x1a/0x130 [ 9.317205] ret_from_fork+0x22/0x30 [ 9.317208] </TASK> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 415fed69	24-Sep-2022	Shuai Xue <xueshuai@linux.alibaba.com>	ACPI: APEI: do not add task_work to kernel thread to avoid memory leak If an error is detected as a result of user-space process accessing a corrupt memory location, the CPU may take an abort. Then the platform firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA, NOTIFY_SOFTWARE_DELEGATED, etc. For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue so that the work is processed before return to user-space. The code use init_mm to check whether the error occurs in user space: if (current->mm != &init_mm) The condition is always true, becase _nobody_ ever has "init_mm" as a real VM any more. In addition to abort, errors can also be signaled as asynchronous exceptions, such as interrupt and SError. In such case, the interrupted current process could be any kind of thread. When a kernel thread is interrupted, the work ghes_kick_task_work deferred to task_work will never be processed because entry_handler returns to call ret_to_kernel() instead of ret_to_user(). Consequently, the estatus_node alloced from ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed. After around 200 allocations in our platform, the ghes_estatus_pool will run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a result, the event failed to be processed. sdei: event 805 on CPU 113 failed with error: -2 Finally, a lot of unhandled events may cause platform firmware to exceed some threshold and reboot. The condition should generally just do if (current->mm) as described in active_mm.rst documentation. Then if an asynchronous error is detected when a kernel thread is running, (e.g. when detected by a background scrubber), do not add task_work to it as the original patch intends to do. Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors") Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 27e932a3	27-Feb-2022	Shuai Xue <xueshuai@linux.alibaba.com>	ACPI: APEI: rename ghes_init() with an "acpi_" prefix ghes_init() sticks out in acpi_init() because it is the only functions without an "acpi_" prefix. Rename ghes_init with an "acpi_" prefix, then all looks fine. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# dc4e8c07	27-Feb-2022	Shuai Xue <xueshuai@linux.alibaba.com>	ACPI: APEI: explicit init of HEST and GHES in apci_init() From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage the estatus memory pool. On the other hand, ghes_init() relies on sdei_init() to detect the SDEI version and (un)register events. The dependencies are as follows: ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init() ghes_init() => sdei_init() HEST is not PCI-specific and initcall ordering is implicit and not well-defined within a level. Based on above, remove acpi_hest_init() from acpi_pci_root_init() and convert ghes_init() and sdei_init() from initcalls to explicit calls in the following order: acpi_hest_init() ghes_init() sdei_init() Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 3ad6fd77	26-Oct-2021	Tony Luck <tony.luck@intel.com>	x86/sgx: Add check for SGX pages to ghes_do_memory_failure() SGX EPC pages do not have a "struct page" associated with them so the pfn_valid() sanity check fails and results in a warning message to the console. Add an additional check to skip the warning if the address of the error is in an SGX EPC page. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-8-tony.luck@intel.com
# ccb5ecdc	11-Jun-2021	Xiaofei Tan <tanxiaofei@huawei.com>	ACPI: APEI: fix synchronous external aborts in user-mode Before commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work"), do_sea() would unconditionally signal the affected task from the arch code. Since that change, the GHES driver sends the signals. This exposes a problem as errors the GHES driver doesn't understand or doesn't handle effectively are silently ignored. It will cause the errors get taken again, and circulate endlessly. User-space task get stuck in this loop. Existing firmware on Kunpeng9xx systems reports cache errors with the 'ARM Processor Error' CPER records. Do memory failure handling for ARM Processor Error Section just like for Memory Error Section. Fixes: 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work") Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 91989c70	16-Oct-2020	Jens Axboe <axboe@kernel.dk>	task_work: cleanup notification modes A previous commit changed the notification mode from true/false to an int, allowing notify-no, notify-yes, or signal-notify. This was backwards compatible in the sense that any existing true/false user would translate to either 0 (on notification sent) or 1, the latter which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2. Clean this up properly, and define a proper enum for the notification mode. Now we have: - TWA_NONE. This is 0, same as before the original change, meaning no notification requested. - TWA_RESUME. This is 1, same as before the original change, meaning that we use TIF_NOTIFY_RESUME. - TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the notification. Clean up all the callers, switching their 0/1/false/true to using the appropriate TWA_* mode for notifications. Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
# 9aa9cf3e	03-Sep-2020	Shiju Jose <shiju.jose@huawei.com>	ACPI / APEI: Add a notifier chain for unknown (vendor) CPER records CPER records describing a firmware-first error are identified by GUID. The ghes driver currently logs, but ignores any unknown CPER records. This prevents describing errors that can't be represented by a standard entry, that would otherwise allow a driver to recover from an error. The UEFI spec calls these 'Non-standard Section Body' (N.2.3 of version 2.8). Add a notifier chain for these non-standard/vendor-records. Callers must identify their type of records by GUID. Record data is copied to memory from the ghes_estatus_pool to allow us to keep it until after the notifier has run. Co-developed-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200903123456.1823-2-shiju.jose@huawei.com Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
# 73f693c3	01-Jun-2020	Joerg Roedel <jroedel@suse.de>	mm: remove vmalloc_sync_(un)mappings() These functions are not needed anymore because the vmalloc and ioremap mappings are now synchronized when they are created or torn down. Remove all callers and function definitions. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Link: http://lkml.kernel.org/r/20200515140023.25469-7-joro@8bytes.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# 7f17b4a1	01-May-2020	James Morse <james.morse@arm.com>	ACPI: APEI: Kick the memory_failure() queue for synchronous errors memory_failure() offlines or repairs pages of memory that have been discovered to be corrupt. These may be detected by an external component, (e.g. the memory controller), and notified via an IRQ. In this case the work is queued as not all of memory_failure()s work can happen in IRQ context. If the error was detected as a result of user-space accessing a corrupt memory location the CPU may take an abort instead. On arm64 this is a 'synchronous external abort', and on a firmware first system it is replayed using NOTIFY_SEA. This notification has NMI like properties, (it can interrupt IRQ-masked code), so the memory_failure() work is queued. If we return to user-space before the queued memory_failure() work is processed, we will take the fault again. This loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. For NMIlike notifications keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue. To save memory allocations, the task_work is allocated as part of the ghes_estatus_node, and free()ing it back to the pool is deferred. Signed-off-by: James Morse <james.morse@arm.com> Tested-by: Tyler Baicar <baicar@os.amperecomputing.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 763802b5	21-Mar-2020	Joerg Roedel <jroedel@suse.de>	x86/mm: split vmalloc_sync_all() Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the vunmap() code-path. While this change was necessary to maintain correctness on x86-32-pae kernels, it also adds additional cycles for architectures that don't need it. Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported severe performance regressions in micro-benchmarks because it now also calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But the vmalloc_sync_all() implementation on x86-64 is only needed for newly created mappings. To avoid the unnecessary work on x86-64 and to gain the performance back, split up vmalloc_sync_all() into two functions: * vmalloc_sync_mappings(), and * vmalloc_sync_unmappings() Most call-sites to vmalloc_sync_all() only care about new mappings being synchronized. The only exception is the new call-site added in the above mentioned commit. Shile Zhang directed us to a report of an 80% regression in reaim throughput. Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") Reported-by: kernel test robot <oliver.sang@intel.com> Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES] Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
# cea79e7e	08-Jan-2020	Bhaskar Upadhaya <bupadhaya@marvell.com>	apei/ghes: Do not delay GHES polling Currently, the ghes_poll_func() timer callback is registered with the TIMER_DEFERRABLE flag. Thus, it is run when the CPU eventually wakes up together with a subsequent non-deferrable timer and not at the precisely configured polling interval. For polling mode, the polling interval configured by firmware should not be exceeded according to the ACPI spec 6.3, Table 18-394. The definition of the polling interval is: "Indicates the poll interval in milliseconds OSPM should use to periodically check the error source for the presence of an error condition." If this interval is extended due to the timer callback deferring, error records can get lost. Which we are observing on our ThunderX2 platforms. Therefore, remove the TIMER_DEFERRABLE flag so that the timer callback executes at the precise interval. Signed-off-by: Bhaskar Upadhaya <bupadhaya@marvell.com> [ bp: Subject & changelog ] Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 933ca4e3	17-Oct-2019	Kefeng Wang <wangkefeng.wang@huawei.com>	acpi: Use pr_warn instead of pr_warning As said in commit f2c2cbcc35d4 ("powerpc: Use pr_warn instead of pr_warning"), removing pr_warning so all logging messages use a consistent <prefix>_warn style. Let's do it. Link: http://lkml.kernel.org/r/20191018031850.48498-8-wangkefeng.wang@huawei.com To: linux-kernel@vger.kernel.org Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: James Morse <james.morse@arm.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [pmladek@suse.com: two more indentation fixes] Signed-off-by: Petr Mladek <pmladek@suse.com>
# 6abc7622	15-Jul-2019	Liguang Zhang <zhangliguang@linux.alibaba.com>	ACPI / APEI: Release resources if gen_pool_add() fails Destroy ghes_estatus_pool and release memory allocated via vmalloc() on errors in ghes_estatus_pool_init() in order to avoid memory leaks. [ bp: do the labels properly and with descriptive names and massage. ] Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/1563173924-47479-1-git-send-email-zhangliguang@linux.alibaba.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# bb100b64	17-Jul-2019	Andy Shevchenko <andriy.shevchenko@linux.intel.com>	ACPI / APEI: Get rid of NULL_UUID_LE constant This is a missed part of the commit 5b53696a30d5 ("ACPI / APEI: Switch to use new generic UUID API"), i.e. replacing old definition with a global constant variable. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 371b8689	24-Jun-2019	Liguang Zhang <zhangliguang@linux.alibaba.com>	ACPI / APEI: Remove needless __ghes_check_estatus() calls Function __ghes_check_estatus() is always called after __ghes_peek_estatus(), but it is already called in __ghes_peek_estatus(). So we should remove some needless __ghes_check_estatus() calls. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 1802d0be	27-May-2019	Thomas Gleixner <tglx@linutronix.de>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 174 Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 655 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070034.575739538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# f9f05395	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Add support for the SDEI GHES Notification type If the GHES notification type is SDEI, register the provided event using the SDEI-GHES helper. SDEI may be one of two types of event, normal and critical. Critical events can interrupt normal events, so these must have separate fixmap slots and locks in case both event types are in use. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# b972d2ea	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications Now that ghes notification helpers provide the fixmap slots and take the lock themselves, multiple NMI-like notifications can be used on arm64. These should be named after their notification method as they can't all be called 'NMI'. x86's NOTIFY_NMI already is, change the SEA fixmap entry to be called FIX_APEI_GHES_SEA. Future patches can add support for FIX_APEI_GHES_SEI and FIX_APEI_GHES_SDEI_{NORMAL,CRITICAL}. Because all of ghes.c builds on both architectures, provide a constant for each fixmap entry that the architecture will never use. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# d9f608dc	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry() Each struct ghes has an worst-case sized buffer for storing the estatus. If an error is being processed by ghes_proc() in process context this buffer will be in use. If the error source then triggers an NMI-like notification, the same buffer will be used by in_nmi_queue_one_entry() to stage the estatus data, before __process_error() copys it into a queued estatus entry. Merge __process_error()s work into in_nmi_queue_one_entry() so that the queued estatus entry is used from the beginning. Use the new ghes_peek_estatus() to know how much memory to allocate from the ghes_estatus_pool before reading the records. Reported-by: Borislav Petkov <bp@suse.de> Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Change since v6: * Added a comment explaining the 'ack-error, then goto no_work'. * Added missing esatus-clearing, which is necessary after reading the GAS, Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# e00a6e33	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length ghes_read_estatus() reads the record address, then the record's header, then performs some sanity checks before reading the records into the provided estatus buffer. To provide this estatus buffer the caller must know the size of the records in advance, or always provide a worst-case sized buffer as happens today for the non-NMI notifications. Add a function to peek at the record's header to find the size. This will let the NMI path allocate the right amount of memory before reading the records, instead of using the worst-case size, and having to copy the records. Split ghes_read_estatus() to create __ghes_peek_estatus() which returns the address and size of the CPER records. Signed-off-by: James Morse <james.morse@arm.com> Changes since v7: * Grammar * concistent argument ordering Changes since v6: * Additional buf_addr = 0 error handling * Moved checking out of peek-estatus * Reworded an error message so we can tell them apart Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# f2a681b9	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Make GHES estatus header validation more user friendly ghes_read_estatus() checks various lengths in the top-level header to ensure the CPER records to be read aren't obviously corrupt. Take the opportunity to make this more user-friendly, printing a (ratelimited) message about the nature of the header format error. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: James Morse <james.morse@arm.com> [ rjw: Add missing 'static' ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# f2a7e059	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Pass ghes and estatus separately to avoid a later copy The NMI-like notifications scribble over ghes->estatus, before copying it somewhere else. If this interrupts the ghes_probe() code calling ghes_proc() on each struct ghes, the data is corrupted. All the NMI-like notifications should use a queued estatus entry from the beginning, instead of the ghes version, then copying it. To do this, break up any use of "ghes->estatus" so that all functions take the estatus as an argument. This patch just moves these ghes->estatus dereferences into separate arguments, no change in behaviour. struct ghes becomes unused in ghes_clear_estatus() as it only wanted ghes->estatus, which we now pass directly. This is removed. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# b484079b	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Let the notification helper specify the fixmap slot ghes_copy_tofrom_phys() uses a different fixmap slot depending on in_nmi(). This doesn't work when there are multiple NMI-like notifications, that could interrupt each other. As with the locking, move the chosen fixmap_idx to the notification helper. This only matters for NMI-like notifications, anything calling ghes_proc() can use the IRQ fixmap slot as its already holding an irqsave spinlock. This lets us collapse the ghes_ioremap_pfn_*() helpers. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
# 3b880cbe	29-Jan-2019	James Morse <james.morse@arm.com>	ACPI / APEI: Move locking to the notification helper ghes_copy_tofrom_phys() takes different locks depending on in_nmi(). This doesn't work if there are multiple NMI-like notifications, that can interrupt each other. Now that NOTIFY_SEA is always called in the same context, move the lock-taking to the notification helper. The helper will always know which lock to take. This avoids ghes_copy_tofrom_phys() taking a guess based on in_nmi(). This splits NOTIFY_NMI and NOTIFY_SEA to use different locks. All the other notifications use ghes_proc(), and are called in process or IRQ context. Move the spin_lock_irqsave() around their ghes_proc() calls. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>