Cross Reference: /linux-master/arch/arm64/kernel/image-vars.h

History log of /linux-master/arch/arm64/kernel/image-vars.h
Revision	Date	Author	Comments
# 9684ec18	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: Enable LPA2 at boot if supported by the system Update the early kernel mapping code to take 52-bit virtual addressing into account based on the LPA2 feature. This is a bit more involved than LVA (which is supported with 64k pages only), given that some page table descriptor bits change meaning in this case. To keep the handling in asm to a minimum, the initial ID map is still created with 48-bit virtual addressing, which implies that the kernel image must be loaded into 48-bit addressable physical memory. This is currently required by the boot protocol, even though we happen to support placement outside of that for LVA/64k based configurations. Enabling LPA2 involves more than setting TCR.T1SZ to a lower value, there is also a DS bit in TCR that needs to be set, and which changes the meaning of bits [9:8] in all page table descriptors. Since we cannot enable DS and every live page table descriptor at the same time, let's pivot through another temporary mapping. This avoids the need to reintroduce manipulations of the page tables with the MMU and caches disabled. To permit the LPA2 feature to be overridden on the kernel command line, which may be necessary to work around silicon errata, or to deal with mismatched features on heterogeneous SoC designs, test for CPU feature overrides first, and only then enable LPA2. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-78-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 68aec33f	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: mm: Add feature override support for LVA Add support for overriding the VARange field of the MMFR2 CPU ID register. This permits the associated LVA feature to be overridden early enough for the boot code that creates the kernel mapping to take it into account. Given that LPA2 implies LVA, disabling the latter should disable the former as well. So override the ID_AA64MMFR0.TGran field of the current page size as well if it advertises support for 52-bit addressing. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-71-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 9cce9c6c	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: mm: Handle LVA support as a CPU feature Currently, we detect CPU support for 52-bit virtual addressing (LVA) extremely early, before creating the kernel page tables or enabling the MMU. We cannot override the feature this early, and so large virtual addressing is always enabled on CPUs that implement support for it if the software support for it was enabled at build time. It also means we rely on non-trivial code in asm to deal with this feature. Given that both the ID map and the TTBR1 mapping of the kernel image are guaranteed to be 48-bit addressable, it is not actually necessary to enable support this early, and instead, we can model it as a CPU feature. That way, we can rely on code patching to get the correct TCR.T1SZ values programmed on secondary boot and resume from suspend. On the primary boot path, we simply enable the MMU with 48-bit virtual addressing initially, and update TCR.T1SZ if LVA is supported from C code, right before creating the kernel mapping. Given that TTBR1 still points to reserved_pg_dir at this point, updating TCR.T1SZ should be safe without the need for explicit TLB maintenance. Since this gets rid of all accesses to the vabits_actual variable from asm code that occurred before TCR.T1SZ had been programmed, we no longer have a need for this variable, and we can replace it with a C expression that produces the correct value directly, based on the value of TCR.T1SZ. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-70-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# ba5b0333	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: mm: omit redundant remap of kernel image Now that the early kernel mapping is created with all the right attributes and segment boundaries, there is no longer a need to recreate it and switch to it. This also means we no longer have to copy the kasan shadow or some parts of the fixmap from one set of page tables to the other. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-68-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 84b04d3e	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: kernel: Create initial ID map from C code The asm code that creates the initial ID map is rather intricate and hard to follow. This is problematic because it makes adding support for things like LPA2 or WXN more difficult than necessary. Also, it is parameterized like the rest of the MM code to run with a configurable number of levels, which is rather pointless, given that all AArch64 CPUs implement support for 48-bit virtual addressing, and that many systems exist with DRAM located outside of the 39-bit addressable range, which is the only smaller VA size that is widely used, and we need additional tricks to make things work in that combination. So let's bite the bullet, and rip out all the asm macros, and fiddly code, and replace it with a C implementation based on the newly added routines for creating the early kernel VA mappings. And while at it, create the initial ID map based on 48-bit virtual addressing as well, regardless of the number of configured levels for the kernel proper. Note that this code may execute with the MMU and caches disabled, and is therefore not permitted to make unaligned accesses. This shouldn't generally happen in any case for the algorithm as implemented, but to be sure, let's pass -mstrict-align to the compiler just in case. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-66-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 97a6f43b	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: head: Move early kernel mapping routines into C code The asm version of the kernel mapping code works fine for creating a coarse grained identity map, but for mapping the kernel down to its exact boundaries with the right attributes, it is not suitable. This is why we create a preliminary RWX kernel mapping first, and then rebuild it from scratch later on. So let's reimplement this in C, in a way that will make it unnecessary to create the kernel page tables yet another time in paging_init(). Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-63-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# aa6a52b2	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: head: move memstart_offset_seed handling to C code Now that we can set BSS variables from the early code running from the ID map, we can set memstart_offset_seed directly from the C code that derives the value instead of passing it back and forth between C and asm code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-60-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# e223a449	14-Feb-2024	Ard Biesheuvel <ardb@kernel.org>	arm64: idreg-override: Move to early mini C runtime We will want to parse the ID register overrides even earlier, so that we can take them into account before creating the kernel mapping. So migrate the code and make it work in the context of the early C runtime. We will move the invocation to an earlier stage in a subsequent patch. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20240214122845.2033971-49-ardb+git@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 3567fa63	13-Dec-2023	Ard Biesheuvel <ardb@kernel.org>	arm64: kaslr: Adjust randomization range dynamically Currently, we base the KASLR randomization range on a rough estimate of the available space in the upper VA region: the lower 1/4th has the module region and the upper 1/4th has the fixmap, vmemmap and PCI I/O ranges, and so we pick a random location in the remaining space in the middle. Once we enable support for 5-level paging with 4k pages, this no longer works: the vmemmap region, being dimensioned to cover a 52-bit linear region, takes up so much space in the upper VA region (the size of which is based on a 48-bit VA space for compatibility with non-LVA hardware) that the region above the vmalloc region takes up more than a quarter of the available space. So instead of a heuristic, let's derive the randomization range from the actual boundaries of the vmalloc region. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20231213084024.2367360-16-ardb@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
# b8466fe8	17-Oct-2023	Arnd Bergmann <arnd@arndb.de>	efi: move screen_info into efi init code After the vga console no longer relies on global screen_info, there are only two remaining use cases: - on the x86 architecture, it is used for multiple boot methods (bzImage, EFI, Xen, kexec) to commucate the initial VGA or framebuffer settings to a number of device drivers. - on other architectures, it is only used as part of the EFI stub, and only for the three sysfb framebuffers (simpledrm, simplefb, efifb). Remove the duplicate data structure definitions by moving it into the efi-init.c file that sets it up initially for the EFI case, leaving x86 as an exception that retains its own definition for non-EFI boots. The added #ifdefs here are optional, I added them to further limit the reach of screen_info to configurations that have at least one of the users enabled. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20231017093947.3627976-1-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
# 45dd403d	18-Apr-2023	Ard Biesheuvel <ardb@kernel.org>	efi/zboot: arm64: Inject kernel code size symbol into the zboot payload The EFI zboot code is not built as part of the kernel proper, like the ordinary EFI stub, but still needs access to symbols that are defined only internally in the kernel, and are left unexposed deliberately to avoid creating ABI inadvertently that we're stuck with later. So capture the kernel code size of the kernel image, and inject it as an ELF symbol into the object that contains the compressed payload, where it will be accessible to zboot code that needs it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com>
# 8bf0a804	30-Jan-2023	Mark Rutland <mark.rutland@arm.com>	arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap When Priority Mask Hint Enable (PMHE) == 0b1, the GIC may use the PMR value to determine whether to signal an IRQ to a PE, and consequently after a change to the PMR value, a DSB SY may be required to ensure that interrupts are signalled to a CPU in finite time. When PMHE == 0b0, interrupts are always signalled to the relevant PE, and all masking occurs locally, without requiring a DSB SY. Since commit: f226650494c6aa87 ("arm64: Relax ICC_PMR_EL1 accesses when ICC_CTLR_EL1.PMHE is clear") ... we handle this dynamically: in most cases a static key is used to determine whether to issue a DSB SY, but the entry code must read from ICC_CTLR_EL1 as static keys aren't accessible from plain assembly. It would be much nicer to use an alternative instruction sequence for the DSB, as this would avoid the need to read from ICC_CTLR_EL1 in the entry code, and for most other code this will result in simpler code generation with fewer instructions and fewer branches. This patch adds a new ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap which is only set when ICC_CTLR_EL1.PMHE == 0b0 (and GIC priority masking is in use). This allows us to replace the existing users of the `gic_pmr_sync` static key with alternative sequences which default to a DSB SY and are relaxed to a NOP when PMHE is not in use. The entry assembly management of the PMR is slightly restructured to use a branch (rather than multiple NOPs) when priority masking is not in use. This is more in keeping with other alternatives in the entry assembly, and permits the use of a separate alternatives for the PMHE-dependent DSB SY (and removal of the conditional branch this currently requires). For consistency I've adjusted both the save and restore paths. According to bloat-o-meter, when building defconfig + CONFIG_ARM64_PSEUDO_NMI=y this shrinks the kernel text by ~4KiB: \| add/remove: 4/2 grow/shrink: 42/310 up/down: 332/-5032 (-4700) The resulting vmlinux is ~66KiB smaller, though the resulting Image size is unchanged due to padding and alignment: \| [mark@lakrids:~/src/linux]% ls -al vmlinux-* \| -rwxr-xr-x 1 mark mark 137508344 Jan 17 14:11 vmlinux-after \| -rwxr-xr-x 1 mark mark 137575440 Jan 17 13:49 vmlinux-before \| [mark@lakrids:~/src/linux]% ls -al Image-* \| -rw-r--r-- 1 mark mark 38777344 Jan 17 14:11 Image-after \| -rw-r--r-- 1 mark mark 38777344 Jan 17 13:49 Image-before Prior to this patch we did not verify the state of ICC_CTLR_EL1.PMHE on secondary CPUs. As of this patch this is verified by the cpufeature code when using GIC priority masking (i.e. when using pseudo-NMIs). Note that since commit: 7e3a57fa6ca831fa ("arm64: Document ICC_CTLR_EL3.PMHE setting requirements") ... Documentation/arm64/booting.rst specifies: \| - ICC_CTLR_EL3.PMHE (bit 6) must be set to the same value across \| all CPUs the kernel is executing on, and must stay constant \| for the lifetime of the kernel. ... so that should not adversely affect any compliant systems, and as we'll only check for the absense of PMHE when using pseudo-NMIs, this will only fire when such mismatch will adversely affect the system. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230130145429.903791-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 61786170	11-Jan-2023	Ard Biesheuvel <ardb@kernel.org>	efi: arm64: enter with MMU and caches enabled Instead of cleaning the entire loaded kernel image to the PoC and disabling the MMU and caches before branching to the kernel's bare metal entry point, we can leave the MMU and caches enabled, and rely on EFI's cacheable 1:1 mapping of all of system RAM (which is mandated by the spec) to populate the initial page tables. This removes the need for managing coherency in software, which is tedious and error prone. Note that we still need to clean the executable region of the image to the PoU if this is required for I/D coherency, but only if we actually decided to move the image in memory, as otherwise, this will have been taken care of by the loader. This change affects both the builtin EFI stub as well as the zboot decompressor, which now carries the entire EFI stub along with the decompression code and the compressed image. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20230111102236.1430401-7-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 169cd0f8	10-Nov-2022	Quentin Perret <qperret@google.com>	KVM: arm64: Don't unnecessarily map host kernel sections at EL2 We no longer need to map the host's '.rodata' and '.bss' sections in the stage-1 page-table of the pKVM hypervisor at EL2, so remove those mappings and avoid creating any future dependencies at EL2 on host-controlled data structures. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-25-will@kernel.org
# 73f38ef2	10-Nov-2022	Will Deacon <will@kernel.org>	KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to modify the variable arbitrarily, potentially leading to all sorts of shenanians as this is used to configure the VTTBR register for the guest stage-2. In preparation for unmapping host sections entirely from EL2, maintain a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it from the host value while it is still trusted. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-23-will@kernel.org
# fe41a7f8	10-Nov-2022	Quentin Perret <qperret@google.com>	KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host When pKVM is enabled, the hypervisor at EL2 does not trust the host at EL1 and must therefore prevent it from having unrestricted access to internal hypervisor state. The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor per-cpu allocations, so move this this into the nVHE code where it cannot be modified by the untrusted host at EL1. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-22-will@kernel.org
# 13e248aa	10-Nov-2022	Will Deacon <will@kernel.org>	KVM: arm64: Provide I-cache invalidation by virtual address at EL2 In preparation for handling cache maintenance of guest pages from within the pKVM hypervisor at EL2, introduce an EL2 copy of icache_inval_pou() which will later be plumbed into the stage-2 page-table cache maintenance callbacks, ensuring that the initial contents of pages mapped as executable into the guest stage-2 page-table is visible to the instruction fetcher. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-17-will@kernel.org
# da8dd0c7	11-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	efi: libstub: Provide local implementations of strrchr() and memchr() Clone the implementations of strrchr() and memchr() in lib/string.c so we can use them in the standalone zboot decompressor app. These routines are used by the FDT handling code. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
# 2e6fa86f	11-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	efi: libstub: Enable efi_printk() in zboot decompressor Split the efi_printk() routine into its own source file, and provide local implementations of strlen() and strnlen() so that the standalone zboot app can efi_err and efi_info etc. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
# 52dce39c	11-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	efi: libstub: Clone memcmp() into the stub We will no longer be able to call into the kernel image once we merge the decompressor with the EFI stub, so we need our own implementation of memcmp(). Let's add the one from lib/string.c and simplify it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
# fa882a13	11-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	efi: libstub: Use local strncmp() implementation unconditionally In preparation for moving the EFI stub functionality into the zboot decompressor, switch to the stub's implementation of strncmp() unconditionally. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
# aaeb3fc6	17-Oct-2022	Ard Biesheuvel <ardb@kernel.org>	arm64: efi: Move dcache cleaning of loaded image out of efi_enter_kernel() The efi_enter_kernel() routine will be shared between the existing EFI stub and the zboot decompressor, and the version of dcache_clean_to_poc() that the core kernel exports to the stub will not be available in the latter case. So move the handling into the .c file which will remain part of the stub build that integrates directly with the kernel proper. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
# c82ceb44	09-Aug-2022	Ard Biesheuvel <ardb@kernel.org>	efi/libstub: use EFI provided memcpy/memset routines The stub is used in different execution environments, but on arm64, RISC-V and LoongArch, we still use the core kernel's implementation of memcpy and memset, as they are just a branch instruction away, and can generally be reused even from code such as the EFI stub that runs in a completely different address space. KAsan complicates this slightly, resulting in the need for some hacks to expose the uninstrumented, __ prefixed versions as the normal ones, as the latter are instrumented to include the KAsan checks, which only work in the core kernel. Unfortunately, #define'ing memcpy to __memcpy when building C code does not guarantee that no explicit memcpy() calls will be emitted. And with the upcoming zboot support, which consists of a separate binary which therefore needs its own implementation of memcpy/memset anyway, it's better to provide one explicitly instead of linking to the existing one. Given that EFI exposes implementations of memmove() and memset() via the boot services table, let's wire those up in the appropriate way, and drop the references to the core kernel ones. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
# d926079f	12-Sep-2022	Mark Rutland <mark.rutland@arm.com>	arm64: alternatives: add shared NOP callback For each instance of an alternative, the compiler outputs a distinct copy of the alternative instructions into a subsection. As the compiler doesn't have special knowledge of alternatives, it cannot coalesce these to save space. In a defconfig kernel built with GCC 12.1.0, there are approximately 10,000 instances of alternative_has_feature_likely(), where the replacement instruction is always a NOP. As NOPs are position-independent, we don't need a unique copy per alternative sequence. This patch adds a callback to patch an alternative sequence with NOPs, and make use of this in alternative_has_feature_likely(). So that this can be used for other sites in future, this is written to patch multiple instructions up to the original sequence length. For NVHE, an alias is added to image-vars.h. For modules, the callback is exported. Note that as modules are loaded within 2GiB of the kernel, an alt_instr entry in a module can always refer directly to the callback, and no special handling is necessary. When building with GCC 12.1.0, the vmlinux is ~158KiB smaller, though the resulting Image size is unchanged due to alignment constraints and padding: \| % ls -al vmlinux-* \| -rwxr-xr-x 1 mark mark 134644592 Sep 1 14:52 vmlinux-after \| -rwxr-xr-x 1 mark mark 134486232 Sep 1 14:50 vmlinux-before \| % ls -al Image-* \| -rw-r--r-- 1 mark mark 37108224 Sep 1 14:52 Image-after \| -rw-r--r-- 1 mark mark 37108224 Sep 1 14:50 Image-before Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-9-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# 21fb26bf	12-Sep-2022	Mark Rutland <mark.rutland@arm.com>	arm64: alternatives: add alternative_has_feature_() Currrently we use a mixture of alternative sequences and static branches to handle features detected at boot time. For ease of maintenance we generally prefer to use static branches in C code, but this has a few downsides: Each static branch has metadata in the __jump_table section, which is not discarded after features are finalized. This wastes some space, and slows down the patching of other static branches. * The static branches are patched at a different point in time from the alternatives, so changes are not atomic. This leaves a transient period where there could be a mismatch between the behaviour of alternatives and static branches, which could be problematic for some features (e.g. pseudo-NMI). * More (instrumentable) kernel code is executed to patch each static branch, which can be risky when patching certain features (e.g. irqflags management for pseudo-NMI). * When CONFIG_JUMP_LABEL=n, static branches are turned into a load of a flag and a conditional branch. This means it isn't safe to use such static branches in an alternative address space (e.g. the NVHE/PKVM hyp code), where the generated address isn't safe to acccess. To deal with these issues, this patch introduces new alternative_has_feature_() helpers, which work like static branches but are patched using alternatives. This ensures the patching is performed at the same time as other alternative patching, allows the metadata to be freed after patching, and is safe for use in alternative address spaces. Note that all supported toolchains have asm goto support, and since commit: a0a12c3ed057af57 ("asm goto: eradicate CC_HAS_ASM_GOTO)" ... the CC_HAS_ASM_GOTO Kconfig symbol has been removed, so no feature check is necessary, and we can always make use of asm goto. Additionally, note that: This has no impact on cpus_have_cap(), which is a dynamic check. * This has no functional impact on cpus_have_const_cap(). The branches are patched slightly later than before this patch, but these branches are not reachable until caps have been finalised. * It is now invalid to use cpus_have_final_cap() in the window between feature detection and patching. All existing uses are only expected after patching anyway, so this should not be a problem. * The LSE atomics will now be enabled during alternatives patching rather than immediately before. As the LL/SC an LSE atomics are functionally equivalent this should not be problematic. When building defconfig with GCC 12.1.0, the resulting Image is 64KiB smaller: \| % ls -al Image-* \| -rw-r--r-- 1 mark mark 37108224 Aug 23 09:56 Image-after \| -rw-r--r-- 1 mark mark 37173760 Aug 23 09:54 Image-before According to bloat-o-meter.pl: \| add/remove: 44/34 grow/shrink: 602/1294 up/down: 39692/-61108 (-21416) \| Function old new delta \| [...] \| Total: Before=16618336, After=16596920, chg -0.13% \| add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-1296 (-1296) \| Data old new delta \| arm64_const_caps_ready 16 - -16 \| cpu_hwcap_keys 1280 - -1280 \| Total: Before=8987120, After=8985824, chg -0.01% \| add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0) \| RO Data old new delta \| Total: Before=18408, After=18408, chg +0.00% Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220912162210.3626215-8-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
# fbf6ad5e	29-Jun-2022	Ard Biesheuvel <ardb@kernel.org>	arm64: lds: use PROVIDE instead of conditional definitions Currently, a build with CONFIG_EFI=n and CONFIG_KASAN=y will not complete successfully because of missing symbols. This is due to the fact that the __pi_ prefixed aliases for __memcpy/__memmove were put inside a #ifdef CONFIG_EFI block inadvertently, and are therefore missing from the build in question. These definitions should only be provided when needed, as they will otherwise clutter up the symbol table, kallsyms etc for no reason. Fortunately, instead of using CPP conditionals, we can achieve the same result by using the linker's PROVIDE() directive, which only defines a symbol if it is required to complete the link. So let's use that for all symbols alias definitions. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220629083246.3729177-1-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
# aacd149b	24-Jun-2022	Ard Biesheuvel <ardb@kernel.org>	arm64: head: avoid relocating the kernel twice for KASLR Currently, when KASLR is in effect, we set up the kernel virtual address space twice: the first time, the KASLR seed is looked up in the device tree, and the kernel virtual mapping is torn down and recreated again, after which the relocations are applied a second time. The latter step means that statically initialized global pointer variables will be reset to their initial values, and to ensure that BSS variables are not set to values based on the initial translation, they are cleared again as well. All of this is needed because we need the command line (taken from the DT) to tell us whether or not to randomize the virtual address space before entering the kernel proper. However, this code has expanded little by little and now creates global state unrelated to the virtual randomization of the kernel before the mapping is torn down and set up again, and the BSS cleared for a second time. This has created some issues in the past, and it would be better to avoid this little dance if possible. So instead, let's use the temporary mapping of the device tree, and execute the bare minimum of code to decide whether or not KASLR should be enabled, and what the seed is. Only then, create the virtual kernel mapping, clear BSS, etc and proceed as normal. This avoids the issues around inconsistent global state due to BSS being cleared twice, and is generally more maintainable, as it permits us to defer all the remaining DT parsing and KASLR initialization to a later time. This means the relocation fixup code runs only a single time as well, allowing us to simplify the RELR handling code too, which is not idempotent and was therefore required to keep track of the offset that was applied the first time around. Note that this means we have to clone a pair of FDT library objects, so that we can control how they are built - we need the stack protector and other instrumentation disabled so that the code can tolerate being called this early. Note that only the kernel page tables and the temporary stack are mapped read-write at this point, which ensures that the early code does not modify any global state inadvertently. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20220624150651.1358849-21-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
# f8051e96	21-Nov-2021	Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>	KVM: arm64: Make VMID bits accessible outside of allocator Since we already set the kvm_arm_vmid_bits in the VMID allocator init function, make it accessible outside as well so that it can be used in the subsequent patch. Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211122121844.867-3-shameerali.kolothum.thodi@huawei.com
# 228a26b9	10-Dec-2021	James Morse <james.morse@arm.com>	arm64: Use the clearbhb instruction in mitigations Future CPUs may implement a clearbhb instruction that is sufficient to mitigate SpectreBHB. CPUs that implement this instruction, but not CSV2.3 must be affected by Spectre-BHB. Add support to use this instruction as the BHB mitigation on CPUs that support it. The instruction is in the hint space, so it will be treated by a NOP as older CPUs. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
# 558c303c	10-Nov-2021	James Morse <james.morse@arm.com>	arm64: Mitigate spectre style branch history side channels Speculation attacks against some high-performance processors can make use of branch history to influence future speculation. When taking an exception from user-space, a sequence of branches or a firmware call overwrites or invalidates the branch history. The sequence of branches is added to the vectors, and should appear before the first indirect branch. For systems using KPTI the sequence is added to the kpti trampoline where it has a free register as the exit from the trampoline is via a 'ret'. For systems not using KPTI, the same register tricks are used to free up a register in the vectors. For the firmware call, arch-workaround-3 clobbers 4 registers, so there is no choice but to save them to the EL1 stack. This only happens for entry from EL0, so if we take an exception due to the stack access, it will not become re-entrant. For KVM, the existing branch-predictor-hardening vectors are used. When a spectre version of these vectors is in use, the firmware call is sufficient to mitigate against Spectre-BHB. For the non-spectre versions, the sequence of branches is added to the indirect vector. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
# be399d82	10-Nov-2021	Sean Christopherson <seanjc@google.com>	KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y Move the definition of kvm_arm_pmu_available to pmu-emul.c and, out of "necessity", hide it behind CONFIG_HW_PERF_EVENTS. Provide a stub for the key's wrapper, kvm_arm_support_pmu_v3(). Moving the key's definition out of perf.c will allow a future commit to delete perf.c entirely. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211111020738.2512932-16-seanjc@google.com
# fade9c2c	24-May-2021	Fuad Tabba <tabba@google.com>	arm64: Rename arm64-internal cache maintenance functions Although naming across the codebase isn't that consistent, it tends to follow certain patterns. Moreover, the term "flush" isn't defined in the Arm Architecture reference manual, and might be interpreted to mean clean, invalidate, or both for a cache. Rename arm64-internal functions to make the naming internally consistent, as well as making it consistent with the Arm ARM, by specifying whether it applies to the instruction, data, or both caches, whether the operation is a clean, invalidate, or both. Also specify which point the operation applies to, i.e., to the point of unification (PoU), coherency (PoC), or persistence (PoP). This commit applies the following sed transformation to all files under arch/arm64: "s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\ "s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\ "s/\binvalidate_icache_range\b/icache_inval_pou/g;"\ "s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\ "s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\ "s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\ "s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\ "s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\ "s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\ "s/\b__flush_icache_all\b/icache_inval_all_pou/g;" Note that __clean_dcache_area_poc is deliberately missing a word boundary check at the beginning in order to match the efistub symbols in image-vars.h. Also note that, despite its name, __flush_icache_range operates on both instruction and data caches. The name change here reflects that. No functional change intended. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
# aec0fae6	18-Mar-2021	Andrew Scull <ascull@google.com>	KVM: arm64: Log source when panicking from nVHE hyp To aid with debugging, add details of the source of a panic from nVHE hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler() rather than directly to panic(). The handler will then add the extra details for debugging before panicking the kernel. If the panic was due to a BUG(), look up the metadata to log the file and line, if available, otherwise log an address that can be looked up in vmlinux. The hyp offset is also logged to allow other hyp VAs to be converted, similar to how the kernel offset is logged during a panic. __hyp_panic_string is now inlined since it no longer needs to be referenced as a symbol and the message is free to diverge between VHE and nVHE. The following is an example of the logs generated by a BUG in nVHE hyp. [ 46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242! [ 46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000 [ 46.755824] Kernel panic - not syncing: HYP panic: [ 46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800 [ 46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000 [ 46.755824] VCPU:0000d93a880d0000 [ 46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133 [ 46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 46.758366] Call trace: [ 46.758601] dump_backtrace+0x0/0x1b0 [ 46.758856] show_stack+0x18/0x70 [ 46.759057] dump_stack+0xd0/0x12c [ 46.759236] panic+0x16c/0x334 [ 46.759426] arm64_kernel_unmapped_at_el0+0x0/0x30 [ 46.759661] kvm_arch_vcpu_ioctl_run+0x134/0x750 [ 46.759936] kvm_vcpu_ioctl+0x2f0/0x970 [ 46.760156] __arm64_sys_ioctl+0xa8/0xec [ 46.760379] el0_svc_common.constprop.0+0x60/0x120 [ 46.760627] do_el0_svc+0x24/0x90 [ 46.760766] el0_svc+0x2c/0x54 [ 46.760915] el0_sync_handler+0x1a4/0x1b0 [ 46.761146] el0_sync+0x170/0x180 [ 46.761889] SMP: stopping secondary CPUs [ 46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000 [ 46.763142] PHYS_OFFSET: 0xffffa9f680000000 [ 46.763359] CPU features: 0x00240022,61806008 [ 46.763651] Memory Limit: none [ 46.813867] ---[ end Kernel panic - not syncing: HYP panic: [ 46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800 [ 46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000 [ 46.813867] VCPU:0000d93a880d0000 ]--- Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
# 755db234	21-Mar-2021	Marc Zyngier <maz@kernel.org>	KVM: arm64: Generate final CTR_EL0 value when running in Protected mode In protected mode, late CPUs are not allowed to boot (enforced by the PSCI relay). We can thus specialise the read_ctr macro to always return a pre-computed, sanitised value. Special care is taken to prevent the use of this custome version outside of the protected mode. Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
# 1025c8c0	19-Mar-2021	Quentin Perret <qperret@google.com>	KVM: arm64: Wrap the host with a stage 2 When KVM runs in protected nVHE mode, make use of a stage 2 page-table to give the hypervisor some control over the host memory accesses. The host stage 2 is created lazily using large block mappings if possible, and will default to page mappings in absence of a better solution. >From this point on, memory accesses from the host to protected memory regions (e.g. not 'owned' by the host) are fatal and lead to hyp_panic(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-36-qperret@google.com
# f320bc74	19-Mar-2021	Quentin Perret <qperret@google.com>	KVM: arm64: Prepare the creation of s1 mappings at EL2 When memory protection is enabled, the EL2 code needs the ability to create and manage its own page-table. To do so, introduce a new set of hypercalls to bootstrap a memory management system at EL2. This leads to the following boot flow in nVHE Protected mode: 1. the host allocates memory for the hypervisor very early on, using the memblock API; 2. the host creates a set of stage 1 page-table for EL2, installs the EL2 vectors, and issues the __pkvm_init hypercall; 3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table and stores it in the memory pool provided by the host; 4. the hypervisor then extends its stage 1 mappings to include a vmemmap in the EL2 VA space, hence allowing to use the buddy allocator introduced in a previous patch; 5. the hypervisor jumps back in the idmap page, switches from the host-provided page-table to the new one, and wraps up its initialization by enabling the new allocator, before returning to the host. 6. the host can free the now unused page-table created for EL2, and will now need to issue hypercalls to make changes to the EL2 stage 1 mappings instead of modifying them directly. Note that for the sake of simplifying the review, this patch focuses on the hypervisor side of things. In other words, this only implements the new hypercalls, but does not make use of them from the host yet. The host-side changes will follow in a subsequent patch. Credits to Will for __pkvm_init_switch_pgd. Acked-by: Will Deacon <will@kernel.org> Co-authored-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-18-qperret@google.com
# 7b4a7b5e	19-Mar-2021	Will Deacon <will@kernel.org>	KVM: arm64: Link position-independent string routines into .hyp.text Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp code and ensure that we always execute the '__pi_' entry point on the offchance that it changes in future. [ qperret: Commit title nits and added linker script alias ] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-3-qperret@google.com
# f27647b5	05-Mar-2021	Marc Zyngier <maz@kernel.org>	KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available When running under a nesting hypervisor, it isn't guaranteed that the virtual HW will include a PMU. In which case, let's not try to access the PMU registers in the world switch, as that'd be deadly. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210209114844.3278746-3-maz@kernel.org Message-Id: <20210305185254.3730990-6-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
# 537db4af	05-Jan-2021	David Brazdil <dbrazdil@google.com>	KVM: arm64: Remove patching of fn pointers in hyp Storing a function pointer in hyp now generates relocation information used at early boot to convert the address to hyp VA. The existing alternative-based conversion mechanism is therefore obsolete. Remove it and simplify its users. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-8-dbrazdil@google.com
# 0fea6e9a	22-Dec-2020	Andrey Konovalov <andreyknvl@google.com>	kasan, arm64: expand CONFIG_KASAN checks Some #ifdef CONFIG_KASAN checks are only relevant for software KASAN modes (either related to shadow memory or compiler instrumentation). Expand those into CONFIG_KASAN_GENERIC \|\| CONFIG_KASAN_SW_TAGS. Link: https://lkml.kernel.org/r/e6971e432dbd72bb897ff14134ebb7e169bdcf0c.1606161801.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Branislav Rankov <Branislav.Rankov@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marco Elver <elver@google.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>