#
d96c66ab |
|
21-Mar-2024 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Rationalise KVM banner output We are not very consistent when it comes to displaying which mode we're in (VHE, {n,h}VHE, protected or not). For example, booting in protected mode with hVHE results in: [ 0.969545] kvm [1]: Protected nVHE mode initialized successfully which is mildly amusing considering that the machine is VHE only. We already cleaned this up a bit with commit 1f3ca7023fe6 ("KVM: arm64: print Hyp mode"), but that's still unsatisfactory. Unify the three strings into one and use a mess of conditional statements to sort it out (yes, it's a slow day). Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240321173706.3280796-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
5c1ebe9a |
|
27-Feb-2024 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Don't initialize idreg debugfs w/ preemption disabled Testing KVM with DEBUG_ATOMIC_SLEEP enabled doesn't get far before hitting the first splat: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 13062, name: vgic_lpi_stress preempt_count: 1, expected: 0 2 locks held by vgic_lpi_stress/13062: #0: ffff080084553240 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xc0/0x13f0 #1: ffff800080485f08 (&kvm->arch.config_lock){+.+.}-{3:3}, at: kvm_arch_vcpu_ioctl+0xd60/0x1788 CPU: 19 PID: 13062 Comm: vgic_lpi_stress Tainted: G W O 6.8.0-dbg-DEV #1 Call trace: dump_backtrace+0xf8/0x148 show_stack+0x20/0x38 dump_stack_lvl+0xb4/0xf8 dump_stack+0x18/0x40 __might_resched+0x248/0x2a0 __might_sleep+0x50/0x88 down_write+0x30/0x150 start_creating+0x90/0x1a0 __debugfs_create_file+0x5c/0x1b0 debugfs_create_file+0x34/0x48 kvm_reset_sys_regs+0x120/0x1e8 kvm_reset_vcpu+0x148/0x270 kvm_arch_vcpu_ioctl+0xddc/0x1788 kvm_vcpu_ioctl+0xb6c/0x13f0 __arm64_sys_ioctl+0x98/0xd8 invoke_syscall+0x48/0x108 el0_svc_common+0xb4/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x54/0x128 el0t_64_sync_handler+0x68/0xc0 el0t_64_sync+0x1a8/0x1b0 kvm_reset_vcpu() disables preemption as it needs to unload vCPU state from the CPU to twiddle with it, which subsequently explodes when taking the parent inode's rwsem while creating the idreg debugfs file. Fix it by moving the initialization to kvm_arch_create_vm_debugfs(). Fixes: 891766581dea ("KVM: arm64: Add debugfs file for guest's ID registers") Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240227094115.1723330-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
c5bac1ef |
|
14-Feb-2024 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Move existing feature disabling over to FGU infrastructure We already trap a bunch of existing features for the purpose of disabling them (MAIR2, POR, ACCDATA, SME...). Let's move them over to our brand new FGU infrastructure. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240214131827.2856277-20-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
888f0880 |
|
14-Feb-2024 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: nv: Add sanitising to VNCR-backed sysregs VNCR-backed "registers" are actually only memory. Which means that there is zero control over what the guest can write, and that it is the hypervisor's job to actually sanitise the content of the backing store. Yeah, this is fun. In order to preserve some form of sanity, add a repainting mechanism that makes use of a per-VM set of RES0/RES1 masks, one pair per VNCR register. These masks get applied on access to the backing store via __vcpu_sys_reg(), ensuring that the state that is consumed by KVM is correct. So far, nothing populates these masks, but stay tuned. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20240214131827.2856277-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
1f3ca702 |
|
09-Feb-2024 |
Joey Gouly <joey.gouly@arm.com> |
KVM: arm64: print Hyp mode Print which of the hyp modes is being used (hVHE, nVHE). Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240209103719.3813599-1-joey.gouly@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
3ed0b512 |
|
12-Nov-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: nv: Compute NV view of idregs as a one-off Now that we have a full copy of the idregs for each VM, there is no point in repainting the sysregs on each access. Instead, we can simply perform the transmation as a one-off and be done with it. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
63912245 |
|
15-Mar-2023 |
Wei Wang <wei.w.wang@intel.com> |
KVM: move KVM_CAP_DEVICE_CTRL to the generic check KVM_CAP_DEVICE_CTRL allows userspace to check if the kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM. Move KVM_CAP_DEVICE_CTRL to the generic check for the two reasons: 1) it already supports arch agnostic usages (i.e. KVM_DEV_TYPE_VFIO). For example, userspace VFIO implementation may needs to create KVM_DEV_TYPE_VFIO on x86, riscv, or arm etc. It is simpler to have it checked at the generic code than at each arch's code. 2) KVM_CREATE_DEVICE has been added to the generic code. Link: https://lore.kernel.org/all/20221215115207.14784-1-wei.w.wang@intel.com Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Acked-by: Anup Patel <anup@brainfault.org> (riscv) Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230315101606.10636-1-wei.w.wang@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
bd412e2a |
|
27-Nov-2023 |
Ryan Roberts <ryan.roberts@arm.com> |
KVM: arm64: Use LPA2 page-tables for stage2 and hyp stage1 Implement a simple policy whereby if the HW supports FEAT_LPA2 for the page size we are using, always use LPA2-style page-tables for stage 2 and hyp stage 1 (assuming an nvhe hyp), regardless of the VMM-requested IPA size or HW-implemented PA size. When in use we can now support up to 52-bit IPA and PA sizes. We use the previously created cpu feature to track whether LPA2 is supported for deciding whether to use the LPA2 or classic pte format. Note that FEAT_LPA2 brings support for bigger block mappings (512GB with 4KB, 64GB with 16KB). We explicitly don't enable these in the library because stage2_apply_range() works on batch sizes of the largest used block mapping, and increasing the size of the batch would lead to soft lockups. See commit 5994bc9e05c2 ("KVM: arm64: Limit stage2_apply_range() batch size to largest block"). With the addition of LPA2 support in the hypervisor, the PA size supported by the HW must be capped with a runtime decision, rather than simply using a compile-time decision based on PA_BITS. For example, on a system that advertises 52 bit PA but does not support FEAT_LPA2, A 4KB or 16KB kernel compiled with LPA2 support must still limit the PA size to 48 bits. Therefore, move the insertion of the PS field into TCR_EL2 out of __kvm_hyp_init assembly code and instead do it in cpu_prepare_hyp_mode() where the rest of TCR_EL2 is prepared. This allows us to figure out PS with kvm_get_parange(), which has the appropriate logic to ensure the above requirement. (and the PS field of VTCR_EL2 is already populated this way). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231127111737.1897081-8-ryan.roberts@arm.com
|
#
02e3858f |
|
07-Dec-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: vgic: Force vcpu vgic teardown on vcpu destroy When failing to create a vcpu because (for example) it has a duplicate vcpu_id, we destroy the vcpu. Amusingly, this leaves the redistributor registered with the KVM_MMIO bus. This is no good, and we should properly clean the mess. Force a teardown of the vgic vcpu interface, including the RD device before returning to the caller. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231207151201.3028710-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
27131b19 |
|
20-Oct-2023 |
Raghavendra Rao Ananta <rananta@google.com> |
KVM: arm64: Sanitize PM{C,I}NTEN{SET,CLR}, PMOVS{SET,CLR} before first run For unimplemented counters, the registers PM{C,I}NTEN{SET,CLR} and PMOVS{SET,CLR} are expected to have the corresponding bits RAZ. Hence to ensure correct KVM's PMU emulation, mask out the RES0 bits. Defer this work to the point that userspace can no longer change the number of advertised PMCs. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-7-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
57fc267f |
|
20-Oct-2023 |
Reiji Watanabe <reijiw@google.com> |
KVM: arm64: PMU: Add a helper to read a vCPU's PMCR_EL0 Add a helper to read a vCPU's PMCR_EL0, and use it whenever KVM reads a vCPU's PMCR_EL0. Currently, the PMCR_EL0 value is tracked per vCPU. The following patches will make (only) PMCR_EL0.N track per guest. Having the new helper will be useful to combine the PMCR_EL0.N field (tracked per guest) and the other fields (tracked per vCPU) to provide the value of PMCR_EL0. No functional change intended. Reviewed-by: Sebastian Ott <sebott@redhat.com> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231020214053.2144305-4-rananta@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
42773357 |
|
20-Oct-2023 |
Reiji Watanabe <reijiw@google.com> |
KVM: arm64: Select default PMU in KVM_ARM_VCPU_INIT handler Future changes to KVM's sysreg emulation will rely on having a valid PMU instance to determine the number of implemented counters (PMCR_EL0.N). This is earlier than when userspace is expected to modify the vPMU device attributes, where the default is selected today. Select the default PMU when handling KVM_ARM_VCPU_INIT such that it is available in time for sysreg emulation. Reviewed-by: Sebastian Ott <sebott@redhat.com> Co-developed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20231020214053.2144305-3-rananta@google.com [Oliver: rewrite changelog] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
27cde4c0 |
|
18-Oct-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Rename helpers for VHE vCPU load/put The names for the helpers we expose to the 'generic' KVM code are a bit imprecise; we switch the EL0 + EL1 sysreg context and setup trap controls that do not need to change for every guest entry/exit. Rename + shuffle things around a bit in preparation for loading the stage-2 MMU context on vcpu_load(). Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
5eba523e |
|
18-Oct-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Reload stage-2 for VMID change on VHE Naturally, a change to the VMID for an MMU implies a new value for VTTBR. Reload on VMID change in anticipation of loading stage-2 on vcpu_load() instead of every guest entry. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
d8569fba |
|
16-Oct-2023 |
Mark Rutland <mark.rutland@arm.com> |
arm64: kvm: Use cpus_have_final_cap() explicitly Much of the arm64 KVM code uses cpus_have_const_cap() to check for cpucaps, but this is unnecessary and it would be preferable to use cpus_have_final_cap(). For historical reasons, cpus_have_const_cap() is more complicated than it needs to be. Before cpucaps are finalized, it will perform a bitmap test of the system_cpucaps bitmap, and once cpucaps are finalized it will use an alternative branch. This used to be necessary to handle some race conditions in the window between cpucap detection and the subsequent patching of alternatives and static branches, where different branches could be out-of-sync with one another (or w.r.t. alternative sequences). Now that we use alternative branches instead of static branches, these are all patched atomically w.r.t. one another, and there are only a handful of cases that need special care in the window between cpucap detection and alternative patching. Due to the above, it would be nice to remove cpus_have_const_cap(), and migrate callers over to alternative_has_cap_*(), cpus_have_final_cap(), or cpus_have_cap() depending on when their requirements. This will remove redundant instructions and improve code generation, and will make it easier to determine how each callsite will behave before, during, and after alternative patching. KVM is initialized after cpucaps have been finalized and alternatives have been patched. Since commit: d86de40decaa14e6 ("arm64: cpufeature: upgrade hyp caps to final") ... use of cpus_have_const_cap() in hyp code is automatically converted to use cpus_have_final_cap(): | static __always_inline bool cpus_have_const_cap(int num) | { | if (is_hyp_code()) | return cpus_have_final_cap(num); | else if (system_capabilities_finalized()) | return __cpus_have_const_cap(num); | else | return cpus_have_cap(num); | } Thus, converting hyp code to use cpus_have_final_cap() directly will not result in any functional change. Non-hyp KVM code is also not executed until cpucaps have been finalized, and it would be preferable to extent the same treatment to this code and use cpus_have_final_cap() directly. This patch converts instances of cpus_have_const_cap() in KVM-only code over to cpus_have_final_cap(). As all of this code runs after cpucaps have been finalized, there should be no functional change as a result of this patch, but the redundant instructions generated by cpus_have_const_cap() will be removed from the non-hyp KVM code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
3f9cd0ca |
|
03-Oct-2023 |
Jing Zhang <jingzhangos@google.com> |
KVM: arm64: Allow userspace to get the writable masks for feature ID registers While the Feature ID range is well defined and pretty large, it isn't inconceivable that the architecture will eventually grow some other ranges that will need to similarly be described to userspace. Add a VM ioctl to allow userspace to get writable masks for feature ID registers in below system register space: op0 = 3, op1 = {0, 1, 3}, CRn = 0, CRm = {0 - 7}, op2 = {0 - 7} This is used to support mix-and-match userspace and kernels for writable ID registers, where userspace may want to know upfront whether it can actually tweak the contents of an idreg or not. Add a new capability (KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES) that returns a bitmap of the valid ranges, which can subsequently be retrieved, one at a time by setting the index of the set bit as the range identifier. Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
54a8006d |
|
27-Sep-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available If our fancy little table is present when calling kvm_mpidr_to_vcpu(), use it to recover the corresponding vcpu. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
5544750e |
|
27-Sep-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Build MPIDR to vcpu index cache at runtime The MPIDR_EL1 register contains a unique value that identifies the CPU. The only problem with it is that it is stupidly large (32 bits, once the useless stuff is removed). Trying to obtain a vcpu from an MPIDR value is a fairly common, yet costly operation: we iterate over all the vcpus until we find the correct one. While this is cheap for small VMs, it is pretty expensive on large ones, specially if you are trying to get to the one that's at the end of the list... In order to help with this, it is important to realise that the MPIDR values are actually structured, and that implementations tend to use a small number of significant bits in the 32bit space. We can use this fact to our advantage by computing a small hash table that uses the "compression" of the significant MPIDR bits as an index, giving us the vcpu index as a result. Given that the MPIDR values can be supplied by userspace, and that an evil VMM could decide to make *all* bits significant, resulting in a 4G-entry table, we only use this method if the resulting table fits in a single page. Otherwise, we fallback to the good old iterative method. Nothing uses that table just yet, but keep your eyes peeled. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
5f4bd815 |
|
27-Sep-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Use vcpu_idx for invalidation tracking While vcpu_id isn't necessarily a bad choice as an identifier for the currently running vcpu, it is provided by userspace, and there is close to no guarantee that it would be unique. Switch it to vcpu_idx instead, for which we have much stronger guarantees. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
9a0a75d3 |
|
27-Sep-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Passing a vcpu_id to kvm_vgic_inject_irq() is silly for two reasons: - we often confuse vcpu_id and vcpu_idx - we eventually have to convert it back to a vcpu - we can't count Instead, pass a vcpu pointer, which is unambiguous. A NULL vcpu is also allowed for interrupts that are not private to a vcpu (such as SPIs). Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
1de10b7d |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Get rid of vCPU-scoped feature bitmap The vCPU-scoped feature bitmap was left in place a couple of releases ago in case the change to VM-scoped vCPU features broke anyone. Nobody has complained and the interop between VM and vCPU bitmaps is pretty gross. Throw it out. Link: https://lore.kernel.org/r/20230920195036.1169791-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
3d4b2a4c |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Remove unused return value from kvm_reset_vcpu() Get rid of the return value for kvm_reset_vcpu() as there are no longer any cases where it returns a nonzero value. Link: https://lore.kernel.org/r/20230920195036.1169791-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
d99fb82f |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler Move the feature check out of kvm_reset_vcpu() so we can make the function succeed uncondtitionally. Link: https://lore.kernel.org/r/20230920195036.1169791-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
12405b09 |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Prevent NV feature flag on systems w/o nested virt It would appear that userspace can select the NV feature flag regardless of whether the system actually supports the feature. Obviously a nested guest isn't getting far in this situation; let's reject the flag instead. Link: https://lore.kernel.org/r/20230920195036.1169791-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
baa28a53 |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl Test for feature support in the ioctl handler rather than kvm_reset_vcpu(). Continue to uphold our all-or-nothing policy with address and generic pointer authentication. Link: https://lore.kernel.org/r/20230920195036.1169791-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
be9c0c01 |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler Test that the system supports SVE before ever getting to kvm_reset_vcpu(). Link: https://lore.kernel.org/r/20230920195036.1169791-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
9116db11 |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler Test that the system supports PMUv3 before ever getting to kvm_reset_vcpu(). Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230920195036.1169791-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
ef150908 |
|
20-Sep-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Add generic check for system-supported vCPU features To date KVM has relied on kvm_reset_vcpu() failing when the vCPU feature flags are unsupported by the system. This is a bit messy since kvm_reset_vcpu() is called at runtime outside of the KVM_ARM_VCPU_INIT ioctl when it is expected to succeed. Further complicating the matter is that kvm_reset_vcpu() must tolerate be idemptotent to the config_lock, as it isn't consistently called with the lock held. Prepare to move feature compatibility checks out of kvm_reset_vcpu() with a 'generic' check that compares the user-provided flags with a computed maximum feature set for the system. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20230920195036.1169791-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
f156a7d1 |
|
10-Aug-2023 |
Vincent Donnefort <vdonnefort@google.com> |
KVM: arm64: Remove size-order align in the nVHE hyp private VA range commit f922c13e778d ("KVM: arm64: Introduce pkvm_alloc_private_va_range()") and commit 92abe0f81e13 ("KVM: arm64: Introduce hyp_alloc_private_va_range()") added an alignment for the start address of any allocation into the nVHE hypervisor private VA range. This alignment (order of the size of the allocation) intends to enable efficient stack verification (if the PAGE_SHIFT bit is zero, the stack pointer is on the guard page and a stack overflow occurred). But this is only necessary for stack allocation and can waste a lot of VA space. So instead make stack-specific functions, handling the guard page requirements, while other users (e.g. fixmap) will only get page alignment. Reviewed-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811112037.1147863-1-vdonnefort@google.com
|
#
b1f778a2 |
|
20-Aug-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: pmu: Resync EL0 state on counter rotation Huang Shijie reports that, when profiling a guest from the host with a number of events that exceeds the number of available counters, the reported counts are wildly inaccurate. Without the counter oversubscription, the reported counts are correct. Their investigation indicates that upon counter rotation (which takes place on the back of a timer interrupt), we fail to re-apply the guest EL0 enabling, leading to the counting of host events instead of guest events. In order to solve this, add yet another hook between the host PMU driver and KVM, re-applying the guest EL0 configuration if the right conditions apply (the host is VHE, we are in interrupt context, and we interrupted a running vcpu). This triggers a new vcpu request which will apply the correct configuration on guest reentry. With this, we have the correct counts, even when the counters are oversubscribed. Reported-by: Huang Shijie <shijie@os.amperecomputing.com> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Tested_by: Huang Shijie <shijie@os.amperecomputing.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230809013953.7692-1-shijie@os.amperecomputing.com Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20230820090108.177817-1-maz@kernel.org
|
#
a77b31dc |
|
15-Aug-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: nv: Add SVC trap forwarding HFGITR_EL2 allows the trap of SVC instructions to EL2. Allow these traps to be forwarded. Take this opportunity to deny any 32bit activity when NV is enabled. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Jing Zhang <jingzhangos@google.com> Link: https://lore.kernel.org/r/20230815183903.2735724-24-maz@kernel.org
|
#
619b5072 |
|
10-Aug-2023 |
David Matlack <dmatlack@google.com> |
KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a range-based TLB invalidation where the range is defined by the memslot. Now that kvm_flush_remote_tlbs_range() can be called from common code we can just use that and drop a bunch of duplicate code from the arch directories. Note this adds a lockdep assertion for slots_lock being held when calling kvm_flush_remote_tlbs_memslot(), which was previously only asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(), but they all hold the slots_lock, so the lockdep assertion continues to hold true. Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating kvm_flush_remote_tlbs_memslot(), since it is no longer necessary. Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230811045127.3308641-7-rananta@google.com
|
#
1ba11dae |
|
27-Jul-2023 |
Shaoqin Huang <shahuang@redhat.com> |
KVM: arm64: Use the known cpu id instead of smp_processor_id() In kvm_arch_vcpu_load(), it has the parameter cpu which is the value of smp_processor_id(), so no need to get it again. Simply replace it. Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230727090754.1900310-1-shahuang@redhat.com
|
#
5346f7e1 |
|
10-Jul-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Always return generic v8 as the preferred target Userspace selecting an implementation-specific vCPU target has been completely useless for a very long time. Let's go whole hog and start returning the generic v8 target across all implementations as the preferred target. Uphold the pre-existing behavior by tolerating either the generic target or an implementation-specific target if the vCPU happens to be running on one of the lucky few parts. Acked-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230710193140.1706399-5-oliver.upton@linux.dev
|
#
ef984060 |
|
10-Jul-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Replace vCPU target with a configuration flag The value of kvm_vcpu_arch::target has been used to determine if a vCPU has actually been initialized. Storing this as an integer is needless at this point, as KVM doesn't do any microarch-specific emulation in the first place. Instead, all we care about is whether or not the vCPU has been initialized. Delete the field in favor of a vCPU configuration flag indicating if KVM_ARM_VCPU_INIT has completed for the vCPU. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230710193140.1706399-4-oliver.upton@linux.dev
|
#
c8a67729 |
|
10-Jul-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Remove pointless check for changed init target At any time there is only a single valid value for KVM_ARM_VCPU_INIT, depending on the current CPU implementation. In all likelihood, this will be the generic ARMv8 target. Drop the pointless check for a changed target value between calls to KVM_ARM_VCPU_INIT and instead rely on the check against kvm_target_cpu(). Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230710193140.1706399-3-oliver.upton@linux.dev
|
#
733c758e |
|
19-Jul-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Rephrase percpu enable/disable tracking in terms of hyp kvm_arm_hardware_enabled is rather misleading, since it doesn't track the state of all hardware resources needed for running a VM. What it actually tracks is whether or not the hyp cpu context has been initialized. Since we're now at the point where vgic + timer irq management has been separated from kvm_arm_hardware_enabled, rephrase it (and the associated helpers) to make it clear what state is being tracked. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230719231855.262973-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
c718ca0e |
|
19-Jul-2023 |
Raghavendra Rao Ananta <rananta@google.com> |
KVM: arm64: Fix hardware enable/disable flows for pKVM When running in protected mode, the hyp stub is disabled after pKVM is initialized, meaning the host cannot enable/disable the hyp at runtime. As such, kvm_arm_hardware_enabled is always 1 after initialization, and kvm_arch_hardware_enable() never enables the vgic maintenance irq or timer irqs. Unconditionally enable/disable the vgic + timer irqs in the respective calls, instead relying on the percpu bookkeeping in the generic code to keep track of which cpus have the interrupts unmasked. Fixes: 466d27e48d7c ("KVM: arm64: Simplify the CPUHP logic") Reported-by: Oliver Upton <oliver.upton@linux.dev> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20230719175400.647154-1-rananta@google.com Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
b321c31c |
|
13-Jul-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when running a preemptible kernel, as it is possible that a vCPU is blocked without requesting a doorbell interrupt. The issue is that any preemption that occurs between vgic_v4_put() and schedule() on the block path will mark the vPE as nonresident and *not* request a doorbell irq. This occurs because when the vcpu thread is resumed on its way to block, vcpu_load() will make the vPE resident again. Once the vcpu actually blocks, we don't request a doorbell anymore, and the vcpu won't be woken up on interrupt delivery. Fix it by tracking that we're entering WFI, and key the doorbell request on that flag. This allows us not to make the vPE resident when going through a preempt/schedule cycle, meaning we don't lose any state. Cc: stable@vger.kernel.org Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Suggested-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Co-developed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
970dee09 |
|
03-Jul-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Disable preemption in kvm_arch_hardware_enable() Since 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock"), hotplugging back a CPU whilst a guest is running results in a number of ugly splats as most of this code expects to run with preemption disabled, which isn't the case anymore. While the context is preemptable, it isn't migratable, which should be enough. But we have plenty of preemptible() checks all over the place, and our per-CPU accessors also disable preemption. Since this affects released versions, let's do the easy fix first, disabling preemption in kvm_arch_hardware_enable(). We can always revisit this with a more invasive fix in the future. Fixes: 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock") Reported-by: Kristina Martsenko <kristina.martsenko@arm.com> Tested-by: Kristina Martsenko <kristina.martsenko@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/aeab7562-2d39-e78e-93b1-4711f8cc3fa5@arm.com Cc: stable@vger.kernel.org # v6.3, v6.4 Link: https://lore.kernel.org/r/20230703163548.1498943-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
fa729bc7 |
|
04-Jul-2023 |
Sudeep Holla <sudeep.holla@arm.com> |
KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm Currently there is no synchronisation between finalize_pkvm() and kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if kvm_arm_init() fails resulting in the following warning on all the CPUs and eventually a HYP panic: | kvm [1]: IPA Size Limit: 48 bits | kvm [1]: Failed to init hyp memory protection | kvm [1]: error initializing Hyp mode: -22 | | <snip> | | WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) | pc : _kvm_host_prot_finalize+0x30/0x50 | lr : __flush_smp_call_function_queue+0xd8/0x230 | | Call trace: | _kvm_host_prot_finalize+0x3c/0x50 | on_each_cpu_cond_mask+0x3c/0x6c | pkvm_drop_host_privileges+0x4c/0x78 | finalize_pkvm+0x3c/0x5c | do_one_initcall+0xcc/0x240 | do_initcall_level+0x8c/0xac | do_initcalls+0x54/0x94 | do_basic_setup+0x1c/0x28 | kernel_init_freeable+0x100/0x16c | kernel_init+0x20/0x1a0 | ret_from_fork+0x10/0x20 | Failed to finalize Hyp protection: -22 | dtb=fvp-base-revc.dtb | kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540! | kvm [95]: nVHE call trace: | kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8 | kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac | kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160 | kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 | kvm [95]: ---[ end nVHE call trace ]--- | kvm [95]: Hyp Offset: 0xfffe8db00ffa0000 | Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237 | Hardware name: FVP Base RevC (DT) | Workqueue: rpciod rpc_async_schedule | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x138/0x33c | nvhe_hyp_panic_handler+0x100/0x184 | new_slab+0x23c/0x54c | ___slab_alloc+0x3e4/0x770 | kmem_cache_alloc_node+0x1f0/0x278 | __alloc_skb+0xdc/0x294 | tcp_stream_alloc_skb+0x2c/0xf0 | tcp_sendmsg_locked+0x3d0/0xda4 | tcp_sendmsg+0x38/0x5c | inet_sendmsg+0x44/0x60 | sock_sendmsg+0x1c/0x34 | xprt_sock_sendmsg+0xdc/0x274 | xs_tcp_send_request+0x1ac/0x28c | xprt_transmit+0xcc/0x300 | call_transmit+0x78/0x90 | __rpc_execute+0x114/0x3d8 | rpc_async_schedule+0x28/0x48 | process_one_work+0x1d8/0x314 | worker_thread+0x248/0x474 | kthread+0xfc/0x184 | ret_from_fork+0x10/0x20 | SMP: stopping secondary CPUs | Kernel Offset: 0x57c5cb460000 from 0xffff800080000000 | PHYS_OFFSET: 0x80000000 | CPU features: 0x00000000,1035b7a3,ccfe773f | Memory Limit: none | ---[ end Kernel panic - not syncing: HYP panic: | PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 | FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 | VCPU:0000000000000000 ]--- Fix it by checking for the successfull initialisation of kvm_arm_init() in finalize_pkvm() before proceeding any futher. Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege") Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
192df2aa |
|
22-Jun-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index KVM_ARM_VCPU_POWER_OFF is as bit index, _not_ a literal bitmask. Nonetheless, commit e3c1c0cae31e ("KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF") started using it that way, meaning that powering off a vCPU with the KVM_ARM_VCPU_INIT ioctl is completely broken. Fix it by using a shifted bit for the bitwise operations instead. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: e3c1c0cae31e ("KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF") Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230622160922.1925530-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
68667240 |
|
09-Jun-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Rip out the vestiges of the 'old' ID register scheme There's no longer a need for the baggage of the old scheme for handling configurable ID register fields. Rip it all out in favor of the generalized infrastructure. Link: https://lore.kernel.org/r/20230609190054.1542113-12-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
8c15c2a0 |
|
13-Jun-2023 |
Mostafa Saleh <smostafa@google.com> |
KVM: arm64: Use different pointer authentication keys for pKVM When the use of pointer authentication is enabled in the kernel it applies to both the kernel itself as well as KVM's nVHE hypervisor. The same keys are used for both the kernel and the nVHE hypervisor, which is less than desirable for pKVM as the host is not trusted at runtime. Naturally, the fix is to use a different set of keys for the hypervisor when running in protected mode. Have the host generate a new set of keys for the hypervisor before deprivileging the kernel. While there might be other sources of random directly available at EL2, this keeps the implementation simple, and the host is trusted anyways until it is deprivileged. Since the host and hypervisor no longer share a set of pointer authentication keys, start context switching them on the host entry/exit path exactly as we do for guest entry/exit. There is no need to handle CPU migration as the nVHE code is not migratable in the first place. Signed-off-by: Mostafa Saleh <smostafa@google.com> Link: https://lore.kernel.org/r/20230614122600.2098901-1-smostafa@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
75c76ab5 |
|
09-Jun-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration Just like we repainted the early arm64 code, we need to update the CPTR_EL2 accesses that are taking place in the nVHE code when hVHE is used, making them look as if they were CPACR_EL1 accesses. Just like the VHE code. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230609162200.2024064-14-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
cff3b5cf |
|
09-Jun-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Disable TTBR1_EL2 when using ARM64_KVM_HVHE When using hVHE, we end-up with two TTBRs at EL2. That's great, but we're not quite ready for this just yet. Disable TTBR1_EL2 by setting TCR_EL2.EPD1 so that we only translate via TTBR0_EL2. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230609162200.2024064-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
d0daf5a2 |
|
09-Jun-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Force HCR_EL2.E2H when ARM64_KVM_HVHE is set Obviously, in order to be able to use VHE whilst at EL2, we need to set HCR_EL2.E2H. Do so when ARM64_KVM_HVHE is set. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230609162200.2024064-11-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
f90f9360 |
|
09-Jun-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Rewrite IMPDEF PMU version as NI KVM allows userspace to write an IMPDEF PMU version to the corresponding 32bit and 64bit ID register fields for the sake of backwards compatibility with kernels that lacked commit 3d0dba5764b9 ("KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation"). Plumbing that IMPDEF PMU version through to the gues is getting in the way of progress, and really doesn't any sense in the first place. Bite the bullet and reinterpret the IMPDEF PMU version as NI (0) for userspace writes. Additionally, spill the dirty details into a comment. Link: https://lore.kernel.org/r/20230609190054.1542113-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
2251e9ff |
|
09-Jun-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Make vCPU feature flags consistent VM-wide To date KVM has allowed userspace to construct asymmetric VMs where particular features may only be supported on a subset of vCPUs. This wasn't really the intened usage pattern, and it is a total pain in the ass to keep working in the kernel. What's more, this is at odds with CPU features in host userspace, where asymmetric features are largely hidden or disabled. It's time to put an end to the whole game. Require all vCPUs in the VM to have the same feature set, rejecting deviants in the KVM_ARM_VCPU_INIT ioctl. Preserve some of the vestiges of per-vCPU feature flags in case we need to reinstate the old behavior for some limited configurations. Yes, this is a sign of cowardice around a user-visibile change. Hoist all of the 32-bit limitations into kvm_vcpu_init_check_features() to avoid nested attempts to acquire the config_lock, which won't end well. Link: https://lore.kernel.org/r/20230609190054.1542113-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
e3c1c0ca |
|
09-Jun-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Relax invariance of KVM_ARM_VCPU_POWER_OFF Allow the value of KVM_ARM_VCPU_POWER_OFF to differ between calls to KVM_ARM_VCPU_INIT. Userspace can already change the state of the vCPU through the KVM_SET_MP_STATE ioctl, so making the bit invariant seems needlessly restrictive. Link: https://lore.kernel.org/r/20230609190054.1542113-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
a7a2c72a |
|
09-Jun-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Separate out feature sanitisation and initialisation kvm_vcpu_set_target() iteratively sanitises and copies feature flags in one go. This is rather odd, especially considering the fact that bitmap accessors can do the heavy lifting. A subsequent change will make vCPU features VM-wide, and fitting that into the present implementation is just a chore. Rework the whole thing to use bitmap accessors to sanitise and copy flags. Link: https://lore.kernel.org/r/20230609190054.1542113-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
12bdce4f |
|
23-May-2023 |
Will Deacon <will@kernel.org> |
KVM: arm64: Probe FF-A version and host/hyp partition ID during init Probe FF-A during pKVM initialisation so that we can detect any inconsistencies in the version or partition ID early on. Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230523101828.7328-3-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
2f440b72 |
|
26-Apr-2023 |
Ricardo Koller <ricarkol@google.com> |
KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE Add a capability for userspace to specify the eager split chunk size. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20230426172330.1439644-6-ricarkol@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
4ff910be |
|
18-Apr-2023 |
Reiji Watanabe <reijiw@google.com> |
KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init() kvm_arch_vcpu_ioctl_vcpu_init() doesn't acquire mp_state_lock when setting the mp_state to KVM_MP_STATE_RUNNABLE. Fix the code to acquire the lock. Signed-off-by: Reiji Watanabe <reijiw@google.com> [maz: minor refactor] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230419021852.2981107-2-reijiw@google.com
|
#
821d935c |
|
04-Apr-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Introduce support for userspace SMCCC filtering As the SMCCC (and related specifications) march towards an 'everything and the kitchen sink' interface for interacting with a system it becomes less likely that KVM will support every related feature. We could do better by letting userspace have a crack at it instead. Allow userspace to define an 'SMCCC filter' that applies to both HVCs and SMCs initiated by the guest. Supporting both conduits with this interface is important for a couple of reasons. Guest SMC usage is table stakes for a nested guest, as HVCs are always taken to the virtual EL2. Additionally, guests may want to interact with a service on the secure side which can now be proxied by userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-10-oliver.upton@linux.dev
|
#
fb88707d |
|
04-Apr-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Use a maple tree to represent the SMCCC filter Maple tree is an efficient B-tree implementation that is intended for storing non-overlapping intervals. Such a data structure is a good fit for the SMCCC filter as it is desirable to sparsely allocate the 32 bit function ID space. To that end, add a maple tree to kvm_arch and correctly init/teardown along with the VM. Wire in a test against the hypercall filter for HVCs which does nothing until the controls are exposed to userspace. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-8-oliver.upton@linux.dev
|
#
e0fc6b21 |
|
04-Apr-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Add vm fd device attribute accessors A subsequent change will allow userspace to convey a filter for hypercalls through a vm device attribute. Add the requisite boilerplate for vm attribute accessors. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230404154050.2270077-4-oliver.upton@linux.dev
|
#
8a5eb2d2 |
|
30-Mar-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data Having the timer IRQs duplicated into each vcpu isn't great, and becomes absolutely awful with NV. So let's move these into the per-VM arch_timer_vm_data structure. This simplifies a lot of code, but requires us to introduce a mutex so that we can reason about userspace trying to change an interrupt number while another vcpu is running, something that wasn't really well handled so far. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-12-maz@kernel.org
|
#
30ec7997 |
|
30-Mar-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: timers: Allow userspace to set the global counter offset And this is the moment you have all been waiting for: setting the counter offset from userspace. We expose a brand new capability that reports the ability to set the offset for both the virtual and physical sides. In keeping with the architecture, the offset is expressed as a delta that is substracted from the physical counter value. Once this new API is used, there is no going back, and the counters cannot be written to to set the offsets implicitly (the writes are instead ignored). Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-8-maz@kernel.org
|
#
96906a91 |
|
30-Mar-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM Being able to lock/unlock all vcpus in one go is a feature that only the vgic has enjoyed so far. Let's be brave and expose it to the world. Reviewed-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230330174800.2677007-7-maz@kernel.org
|
#
4bba7f7d |
|
27-Mar-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Use config_lock to protect data ordered against KVM_RUN There are various bits of VM-scoped data that can only be configured before the first call to KVM_RUN, such as the hypercall bitmaps and the PMU. As these fields are protected by the kvm->lock and accessed while holding vcpu->mutex, this is yet another example of lock inversion. Change out the kvm->lock for kvm->arch.config_lock in all of these instances. Opportunistically simplify the locking mechanics of the PMU configuration by holding the config_lock for the entirety of kvm_arm_pmu_v3_set_attr(). Note that this also addresses a couple of bugs. There is an unguarded read of the PMU version in KVM_ARM_VCPU_PMU_V3_FILTER which could race with KVM_ARM_VCPU_PMU_V3_SET_PMU. Additionally, until now writes to the per-vCPU vPMU irq were not serialized VM-wide, meaning concurrent calls to KVM_ARM_VCPU_PMU_V3_IRQ could lead to a false positive in pmu_irq_is_valid(). Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-4-oliver.upton@linux.dev
|
#
c43120af |
|
27-Mar-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Avoid lock inversion when setting the VM register width kvm->lock must be taken outside of the vcpu->mutex. Of course, the locking documentation for KVM makes this abundantly clear. Nonetheless, the locking order in KVM/arm64 has been wrong for quite a while; we acquire the kvm->lock while holding the vcpu->mutex all over the shop. All was seemingly fine until commit 42a90008f890 ("KVM: Ensure lockdep knows about kvm->lock vs. vcpu->mutex ordering rule") caught us with our pants down, leading to lockdep barfing: ====================================================== WARNING: possible circular locking dependency detected 6.2.0-rc7+ #19 Not tainted ------------------------------------------------------ qemu-system-aar/859 is trying to acquire lock: ffff5aa69269eba0 (&host_kvm->lock){+.+.}-{3:3}, at: kvm_reset_vcpu+0x34/0x274 but task is already holding lock: ffff5aa68768c0b8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x8c/0xba0 which lock already depends on the new lock. Add a dedicated lock to serialize writes to VM-scoped configuration from the context of a vCPU. Protect the register width flags with the new lock, thus avoiding the need to grab the kvm->lock while holding vcpu->mutex in kvm_reset_vcpu(). Cc: stable@vger.kernel.org Reported-by: Jeremy Linton <jeremy.linton@arm.com> Link: https://lore.kernel.org/kvmarm/f6452cdd-65ff-34b8-bab0-5c06416da5f6@arm.com/ Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-3-oliver.upton@linux.dev
|
#
0acc7239 |
|
27-Mar-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON KVM/arm64 had the lock ordering backwards on vcpu->mutex and kvm->lock from the very beginning. One such example is the way vCPU resets are handled: the kvm->lock is acquired while handling a guest CPU_ON PSCI call. Add a dedicated lock to serialize writes to kvm_vcpu_arch::{mp_state, reset_state}. Promote all accessors of mp_state to {READ,WRITE}_ONCE() as readers do not acquire the mp_state_lock. While at it, plug yet another race by taking the mp_state_lock in the KVM_SET_MP_STATE ioctl handler. As changes to MP state are now guarded with a dedicated lock, drop the kvm->lock acquisition from the PSCI CPU_ON path. Similarly, move the reader of reset_state outside of the kvm->lock and instead protect it with the mp_state_lock. Note that writes to reset_state::reset have been demoted to regular stores as both readers and writers acquire the mp_state_lock. While the kvm->lock inversion still exists in kvm_reset_vcpu(), at least now PSCI CPU_ON no longer depends on it for serializing vCPU reset. Cc: stable@vger.kernel.org Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230327164747.2466958-2-oliver.upton@linux.dev
|
#
d8708b80 |
|
08-Feb-2023 |
Thomas Huth <thuth@redhat.com> |
KVM: Change return type of kvm_arch_vm_ioctl() to "int" All kvm_arch_vm_ioctl() implementations now only deal with "int" types as return values, so we can change the return type of these functions to use "int" instead of "long". Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20230208140105.655814-7-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
87727ba2 |
|
20-Apr-2023 |
Will Deacon <will@kernel.org> |
KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege Although pKVM supports CPU PMU emulation for non-protected guests since 722625c6f4c5 ("KVM: arm64: Reenable pmu in Protected Mode"), this relies on the PMU driver probing before the host has de-privileged so that the 'kvm_arm_pmu_available' static key can still be enabled by patching the hypervisor text. As it happens, both of these events hang off device_initcall() but the PMU consistently won the race until 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf"). Since then, the host will fail to boot when pKVM is enabled: | hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available | kvm [1]: nVHE hyp BUG at: [<ffff8000090366e0>] __kvm_nvhe_handle_host_mem_abort+0x270/0x284! | kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE | kvm [1]: Hyp Offset: 0xfffea41fbdf70000 | Kernel panic - not syncing: HYP panic: | PS:a00003c9 PC:0000dbe04b0c66e0 ESR:00000000f2000800 | FAR:fffffbfffddfcf00 HPFAR:00000000010b0bf0 PAR:0000000000000000 | VCPU:0000000000000000 | CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7-00083-g0bce6746d154 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0xec/0x108 | show_stack+0x18/0x2c | dump_stack_lvl+0x50/0x68 | dump_stack+0x18/0x24 | panic+0x13c/0x33c | nvhe_hyp_panic_handler+0x10c/0x190 | aarch64_insn_patch_text_nosync+0x64/0xc8 | arch_jump_label_transform+0x4c/0x5c | __jump_label_update+0x84/0xfc | jump_label_update+0x100/0x134 | static_key_enable_cpuslocked+0x68/0xac | static_key_enable+0x20/0x34 | kvm_host_pmu_init+0x88/0xa4 | armpmu_register+0xf0/0xf4 | arm_pmu_acpi_probe+0x2ec/0x368 | armv8_pmu_driver_init+0x38/0x44 | do_one_initcall+0xcc/0x240 Fix the race properly by deferring the de-privilege step to device_initcall_sync(). This will also be needed in future when probing IOMMU devices and allows us to separate the pKVM de-privilege logic from the core hypervisor initialisation path. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Fixes: 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf") Tested-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230420123356.2708-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
e8162521 |
|
04-Apr-2023 |
Fuad Tabba <tabba@google.com> |
KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV2/3 to protected VMs The existing pKVM code attempts to advertise CSV2/3 using values initialized to 0, but never set. To advertise CSV2/3 to protected guests, pass the CSV2/3 values to hyp when initializing hyp's view of guests' ID_AA64PFR0_EL1. Similar to non-protected KVM, these are system-wide, rather than per cpu, for simplicity. Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers") Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20230404152321.413064-1-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
52882b9c |
|
04-May-2022 |
Alexey Kardashevskiy <aik@ozlabs.ru> |
KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE platform dependent When introduced, IRQFD resampling worked on POWER8 with XICS. However KVM on POWER9 has never implemented it - the compatibility mode code ("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native XIVE mode does not handle INTx in KVM at all. This moved the capability support advertising to platforms and stops advertising it on XIVE, i.e. POWER9 and later. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Anup Patel <anup@brainfault.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20220504074807.3616813-1-aik@ozlabs.ru> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
675cabc8 |
|
09-Feb-2023 |
Jintack Lim <jintack.lim@linaro.org> |
arm64: Add ARM64_HAS_NESTED_VIRT cpufeature Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the CPU has the ARMv8.3 nested virtualization capability, together with the 'kvm-arm.mode=nested' command line option. This will be used to support nested virtualization in KVM. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jintack Lim <jintack.lim@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [maz: moved the command-line option to kvm-arm.mode] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230209175820.1939006-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
5f623a59 |
|
06-Feb-2023 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Mark some VM-scoped allocations as __GFP_ACCOUNT Generally speaking, any memory allocations that can be associated with a particular VM should be charged to the cgroup of its process. Nonetheless, there are a couple spots in KVM/arm64 that aren't currently accounted: - the ccsidr array containing the virtualized cache hierarchy - the cpumask of supported cpus, for use of the vPMU on heterogeneous systems Go ahead and set __GFP_ACCOUNT for these allocations. Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Link: https://lore.kernel.org/r/20230206235229.4174711-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
67d953d4 |
|
07-Feb-2023 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Fix non-kerneldoc comments The robots amongts us have started spitting out irritating emails about random errors such as: <quote> arch/arm64/kvm/arm.c:2207: warning: expecting prototype for Initialize Hyp(). Prototype was for kvm_arm_init() instead </quote> which makes little sense until you finally grok what they are on about: comments that look like a kerneldoc, but that aren't. Let's address this before I get even more irritated... ;-) Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/63e139e1.J5AHO6vmxaALh7xv%25lkp@intel.com Link: https://lore.kernel.org/r/20230207094321.1238600-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
8669651c |
|
01-Feb-2023 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Provide sanitized SYS_ID_AA64SMFR0_EL1 to nVHE We will need a sanitized copy of SYS_ID_AA64SMFR0_EL1 from the nVHE EL2 code shortly, so make sure to provide it with a copy. Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230201103755.1398086-2-qperret@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
#
81a1cf9f |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop kvm_arch_check_processor_compat() hook Drop kvm_arch_check_processor_compat() and its support code now that all architecture implementations are nops. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-Id: <20221130230934.1014142-33-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a578a0a9 |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop kvm_arch_{init,exit}() hooks Drop kvm_arch_init() and kvm_arch_exit() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-30-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
53bf620a |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init Tag kvm_arm_init() and its unique helper as __init, and tag data that is only ever modified under the kvm_arm_init() umbrella as read-only after init. Opportunistically name the boolean param in kvm_timer_hyp_init()'s prototype to match its definition. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-21-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1dc0f02d |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Do arm/arch initialization without bouncing through kvm_init() Do arm/arch specific initialization directly in arm's module_init(), now called kvm_arm_init(), instead of bouncing through kvm_init() to reach kvm_arch_init(). Invoking kvm_arch_init() is the very first action performed by kvm_init(), so from a initialization perspective this is a glorified nop. Avoiding kvm_arch_init() also fixes a mostly benign bug as kvm_arch_exit() doesn't properly unwind if a later stage of kvm_init() fails. While the soon-to-be-deleted comment about compiling as a module being unsupported is correct, kvm_arch_exit() can still be called by kvm_init() if any step after the call to kvm_arch_init() succeeds. Add a FIXME to call out that pKVM initialization isn't unwound if kvm_init() fails, which is a pre-existing problem inherited from kvm_arch_exit(). Making kvm_arch_init() a nop will also allow dropping kvm_arch_init() and kvm_arch_exit() entirely once all other architectures follow suit. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
78b3bf48 |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Unregister perf callbacks if hypervisor finalization fails Undo everything done by init_subsystems() if a later initialization step fails, i.e. unregister perf callbacks in addition to unregistering the power management notifier. Fixes: bfa79a805454 ("KVM: arm64: Elevate hypervisor mappings creation at EL2") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6baaeda8 |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Free hypervisor allocations if vector slot init fails Teardown hypervisor mode if vector slot setup fails in order to avoid leaking any allocations done by init_hyp_mode(). Fixes: b881cdce77b4 ("KVM: arm64: Allocate hyp vectors statically") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-18-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
466d27e4 |
|
30-Nov-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Simplify the CPUHP logic For a number of historical reasons, the KVM/arm64 hotplug setup is pretty complicated, and we have two extra CPUHP notifiers for vGIC and timers. It looks pretty pointless, and gets in the way of further changes. So let's just expose some helpers that can be called from the core CPUHP callback, and get rid of everything else. This gives us the opportunity to drop a useless notifier entry, as well as tidy-up the timer enable/disable, which was a bit odd. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221130230934.1014142-17-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
63a1bd8a |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Drop arch hardware (un)setup hooks Drop kvm_arch_hardware_setup() and kvm_arch_hardware_unsetup() now that all implementations are nops. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # s390 Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20221130230934.1014142-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3d0dba57 |
|
13-Nov-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: PMU: Move the ID_AA64DFR0_EL1.PMUver limit to VM creation As further patches will enable the selection of a PMU revision from userspace, sample the supported PMU revision at VM creation time, rather than building each time the ID_AA64DFR0_EL1 register is accessed. This shouldn't result in any change in behaviour. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221113163832.3154370-11-maz@kernel.org
|
#
73f38ef2 |
|
10-Nov-2022 |
Will Deacon <will@kernel.org> |
KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to modify the variable arbitrarily, potentially leading to all sorts of shenanians as this is used to configure the VTTBR register for the guest stage-2. In preparation for unmapping host sections entirely from EL2, maintain a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it from the host value while it is still trusted. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-23-will@kernel.org
|
#
fe41a7f8 |
|
10-Nov-2022 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host When pKVM is enabled, the hypervisor at EL2 does not trust the host at EL1 and must therefore prevent it from having unrestricted access to internal hypervisor state. The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor per-cpu allocations, so move this this into the nVHE code where it cannot be modified by the untrusted host at EL1. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-22-will@kernel.org
|
#
315775ff |
|
10-Nov-2022 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Consolidate stage-2 initialisation into a single function The initialisation of guest stage-2 page-tables is currently split across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2(). That is presumably for historical reasons as kvm_arm_setup_stage2() originates from the (now defunct) KVM port for 32-bit Arm. Simplify this code path by merging both functions into one, taking care to map the 'struct kvm' into the hypervisor stage-1 early on in order to simplify the failure path. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-19-will@kernel.org
|
#
13e248aa |
|
10-Nov-2022 |
Will Deacon <will@kernel.org> |
KVM: arm64: Provide I-cache invalidation by virtual address at EL2 In preparation for handling cache maintenance of guest pages from within the pKVM hypervisor at EL2, introduce an EL2 copy of icache_inval_pou() which will later be plumbed into the stage-2 page-table cache maintenance callbacks, ensuring that the initial contents of pages mapped as executable into the guest stage-2 page-table is visible to the instruction fetcher. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-17-will@kernel.org
|
#
6c165223 |
|
10-Nov-2022 |
Will Deacon <will@kernel.org> |
KVM: arm64: Initialise hypervisor copies of host symbols unconditionally The nVHE object at EL2 maintains its own copies of some host variables so that, when pKVM is enabled, the host cannot directly modify the hypervisor state. When running in normal nVHE mode, however, these variables are still mirrored at EL2 but are not initialised. Initialise the hypervisor symbols from the host copies regardless of pKVM, ensuring that any reference to this data at EL2 with normal nVHE will return a sensibly initialised value. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-16-will@kernel.org
|
#
9d0c063a |
|
10-Nov-2022 |
Fuad Tabba <tabba@google.com> |
KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 With the pKVM hypervisor at EL2 now offering hypercalls to the host for creating and destroying VM and vCPU structures, plumb these in to the existing arm64 KVM backend to ensure that the hypervisor data structures are allocated and initialised on first vCPU run for a pKVM guest. In the host, 'struct kvm_protected_vm' is introduced to hold the handle of the pKVM VM instance as well as to track references to the memory donated to the hypervisor so that it can be freed back to the host allocator following VM teardown. The stage-2 page-table, hypervisor VM and vCPU structures are allocated separately so as to avoid the need for a large physically-contiguous allocation in the host at run-time. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-14-will@kernel.org
|
#
579d7ebe |
|
03-Nov-2022 |
Ryan Roberts <ryan.roberts@arm.com> |
KVM: arm64: Fix kvm init failure when mode!=vhe and VA_BITS=52. For nvhe and protected modes, the hyp stage 1 page-tables were previously configured to have the same number of VA bits as the kernel's idmap. However, for kernel configs with VA_BITS=52 and where the kernel is loaded in physical memory below 48 bits, the idmap VA bits is actually smaller than the kernel's normal stage 1 VA bits. This can lead to kernel addresses that can't be mapped into the hypervisor, leading to kvm initialization failure during boot: kvm [1]: IPA Size Limit: 48 bits kvm [1]: Cannot map world-switch code kvm [1]: error initializing Hyp mode: -34 Fix this by ensuring that the hyp stage 1 VA size is the maximum of what's used for the idmap and the regular kernel stage 1. At the same time, refactor the code so that the hyp VA bits is only calculated in one place. Prior to 7ba8f2b2d652, the idmap was always 52 bits for a 52 VA bits kernel and therefore the hyp stage1 was also always 52 bits. Fixes: 7ba8f2b2d652 ("arm64: mm: use a 48-bit ID map when possible on 52-bit VA builds") Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> [maz: commit message fixes] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221103150507.32948-2-ryan.roberts@arm.com
|
#
9cb1096f |
|
10-Nov-2022 |
Gavin Shan <gshan@redhat.com> |
KVM: arm64: Enable ring-based dirty memory tracking Enable ring-based dirty memory tracking on ARM64: - Enable CONFIG_HAVE_KVM_DIRTY_RING_ACQ_REL. - Enable CONFIG_NEED_KVM_DIRTY_RING_WITH_BITMAP. - Set KVM_DIRTY_LOG_PAGE_OFFSET for the ring buffer's physical page offset. - Add ARM64 specific kvm_arch_allow_write_without_running_vcpu() to keep the site of saving vgic/its tables out of the no-running-vcpu radar. Signed-off-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110104914.31280-5-gshan@redhat.com
|
#
d663b8a2 |
|
03-Nov-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: replace direct irq.h inclusion virt/kvm/irqchip.c is including "irq.h" from the arch-specific KVM source directory (i.e. not from arch/*/include) for the sole purpose of retrieving irqchip_in_kernel. Making the function inline in a header that is already included, such as asm/kvm_host.h, is not possible because it needs to look at struct kvm which is defined after asm/kvm_host.h is included. So add a kvm_arch_irqchip_in_kernel non-inline function; irqchip_in_kernel() is only performance critical on arm64 and x86, and the non-inline function is enough on all other architectures. irq.h can then be deleted from all architectures except x86. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b2a4d007 |
|
20-Sep-2022 |
Elliot Berman <quic_eberman@quicinc.com> |
KVM: arm64: Ignore kvm-arm.mode if !is_hyp_mode_available() Ignore kvm-arm.mode if !is_hyp_mode_available(). Specifically, we want to avoid switching kvm_mode to KVM_MODE_PROTECTED if hypervisor mode is not available. This prevents "Protected KVM" cpu capability being reported when Linux is booting in EL1 and would not have KVM enabled. Reasonably though, we should warn if the command line is requesting a KVM mode at all if KVM isn't actually available. Allow "kvm-arm.mode=none" to skip the warning since this would disable KVM anyway. Signed-off-by: Elliot Berman <quic_eberman@quicinc.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220920190658.2880184-1-quic_eberman@quicinc.com
|
#
c59fb127 |
|
20-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: remove KVM_REQ_UNHALT KVM_REQ_UNHALT is now unnecessary because it is replaced by the return value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. No functional change intended. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Message-Id: <20220921003201.1441511-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
522c9a64 |
|
08-Sep-2022 |
Zenghui Yu <yuzenghui@huawei.com> |
KVM: arm64: Use kmemleak_free_part_phys() to unregister hyp_mem_base With commit 0c24e061196c ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA"), kmemleak started to put the objects allocated with physical address onto object_phys_tree_root tree. The kmemleak_free_part() therefore no longer worked as expected on physically allocated objects (hyp_mem_base in this case) as it attempted to search and remove things in object_tree_root tree. Fix it by using kmemleak_free_part_phys() to unregister hyp_mem_base. This fixes an immediate crash when booting a KVM host in protected mode with kmemleak enabled. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220908130659.2021-1-yuzenghui@huawei.com
|
#
f3c6efc7 |
|
16-Aug-2022 |
Oliver Upton <oliver.upton@linux.dev> |
KVM: arm64: Treat PMCR_EL1.LC as RES1 on asymmetric systems KVM does not support AArch32 on asymmetric systems. To that end, enforce AArch64-only behavior on PMCR_EL1.LC when on an asymmetric system. Fixes: 2122a833316f ("arm64: Allow mismatched 32-bit EL0 support") Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220816192554.1455559-2-oliver.upton@linux.dev
|
#
db129d48 |
|
26-Jul-2022 |
Kalesh Singh <kaleshsingh@google.com> |
KVM: arm64: Implement non-protected nVHE hyp stack unwinder Implements the common framework necessary for unwind() to work for non-protected nVHE mode: - on_accessible_stack() - on_overflow_stack() - unwind_next() Non-protected nVHE unwind() is used to unwind and dump the hypervisor stacktrace by the host in EL1 Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220726073750.3219117-11-kaleshsingh@google.com
|
#
9f968c92 |
|
05-Jul-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address setting We carry a legacy interface to set the base addresses for GICv2. As this is currently plumbed into the same handling code as the modern interface, it limits the evolution we can make there. Add a helper dedicated to this handling, with a view of maybe removing this in the future. Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
b4da9187 |
|
08-Jun-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Move the handling of !FP outside of the fast path We currently start by assuming that the host owns the FP unit at load time, then check again whether this is the case as we are about to run. Only at this point do we account for the fact that there is a (vanishingly small) chance that we're running on a system without a FPSIMD unit (yes, this is madness). We can actually move this FPSIMD check as early as load-time, and drop the check at run time. No intended change in behaviour. Suggested-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
eebc538d |
|
27-May-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Move vcpu WFIT flag to the state flag set The host kernel uses the WFIT flag to remember that a vcpu has used this instruction and wake it up as required. Move it to the state set, as nothing in the hypervisor uses this information. Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
699bb2e0 |
|
27-May-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Move vcpu PC/Exception flags to the input flag set The PC update flags (which also deal with exception injection) is one of the most complicated use of the flag we have. Make it more fool prof by: - moving it over to the new accessors and assign it to the input flag set - turn the combination of generic ELx flags with another flag indicating the target EL itself into an explicit set of flags for each EL and vector combination - add a new accessor to pend the exception This is otherwise a pretty straightformward conversion. Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
56961c63 |
|
16-Jun-2022 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Prevent kmemleak from accessing pKVM memory Commit a7259df76702 ("memblock: make memblock_find_in_range method private") changed the API using which memory is reserved for the pKVM hypervisor. However, memblock_phys_alloc() differs from the original API in terms of kmemleak semantics -- the old one didn't report the reserved regions to kmemleak while the new one does. Unfortunately, when protected KVM is enabled, all kernel accesses to pKVM-private memory result in a fatal exception, which can now happen because of kmemleak scans: $ echo scan > /sys/kernel/debug/kmemleak [ 34.991354] kvm [304]: nVHE hyp BUG at: [<ffff800008fa3750>] __kvm_nvhe_handle_host_mem_abort+0x270/0x290! [ 34.991580] kvm [304]: Hyp Offset: 0xfffe8be807e00000 [ 34.991813] Kernel panic - not syncing: HYP panic: [ 34.991813] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 34.991813] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 34.991813] VCPU:0000000000000000 [ 34.993660] CPU: 0 PID: 304 Comm: bash Not tainted 5.19.0-rc2 #102 [ 34.994059] Hardware name: linux,dummy-virt (DT) [ 34.994452] Call trace: [ 34.994641] dump_backtrace.part.0+0xcc/0xe0 [ 34.994932] show_stack+0x18/0x6c [ 34.995094] dump_stack_lvl+0x68/0x84 [ 34.995276] dump_stack+0x18/0x34 [ 34.995484] panic+0x16c/0x354 [ 34.995673] __hyp_pgtable_total_pages+0x0/0x60 [ 34.995933] scan_block+0x74/0x12c [ 34.996129] scan_gray_list+0xd8/0x19c [ 34.996332] kmemleak_scan+0x2c8/0x580 [ 34.996535] kmemleak_write+0x340/0x4a0 [ 34.996744] full_proxy_write+0x60/0xbc [ 34.996967] vfs_write+0xc4/0x2b0 [ 34.997136] ksys_write+0x68/0xf4 [ 34.997311] __arm64_sys_write+0x20/0x2c [ 34.997532] invoke_syscall+0x48/0x114 [ 34.997779] el0_svc_common.constprop.0+0x44/0xec [ 34.998029] do_el0_svc+0x2c/0xc0 [ 34.998205] el0_svc+0x2c/0x84 [ 34.998421] el0t_64_sync_handler+0xf4/0x100 [ 34.998653] el0t_64_sync+0x18c/0x190 [ 34.999252] SMP: stopping secondary CPUs [ 35.000034] Kernel Offset: disabled [ 35.000261] CPU features: 0x800,00007831,00001086 [ 35.000642] Memory Limit: none [ 35.001329] ---[ end Kernel panic - not syncing: HYP panic: [ 35.001329] PS:600003c9 PC:0000f418011a3750 ESR:00000000f2000800 [ 35.001329] FAR:ffff000439200000 HPFAR:0000000004792000 PAR:0000000000000000 [ 35.001329] VCPU:0000000000000000 ]--- Fix this by explicitly excluding the hypervisor's memory pool from kmemleak like we already do for the hyp BSS. Cc: Mike Rapoport <rppt@kernel.org> Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Signed-off-by: Quentin Perret <qperret@google.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220616161135.3997786-1-qperret@google.com
|
#
cde5042a |
|
09-Jun-2022 |
Will Deacon <will@kernel.org> |
KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode() only returns KVM_MODE_PROTECTED on systems where the feature is available. Cc: David Brazdil <dbrazdil@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-4-will@kernel.org
|
#
ae187fec |
|
09-Jun-2022 |
Will Deacon <will@kernel.org> |
KVM: arm64: Return error from kvm_arch_init_vm() on allocation failure If we fail to allocate the 'supported_cpus' cpumask in kvm_arch_init_vm() then be sure to return -ENOMEM instead of success (0) on the failure path. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220609121223.2551-2-will@kernel.org
|
#
20492a62 |
|
16-May-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: pmu: Restore compilation when HW_PERF_EVENTS isn't selected Moving kvm_pmu_events into the vcpu (and refering to it) broke the somewhat unusual case where the kernel has no support for a PMU at all. In order to solve this, move things around a bit so that we can easily avoid refering to the pmu structure outside of PMU-aware code. As a bonus, pmu.c isn't compiled in when HW_PERF_EVENTS isn't selected. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/202205161814.KQHpOzsJ-lkp@intel.com
|
#
84d751a0 |
|
10-May-2022 |
Fuad Tabba <tabba@google.com> |
KVM: arm64: Pass pmu events to hyp via vcpu Instead of the host accessing hyp data directly, pass the pmu events of the current cpu to hyp via the vcpu. This adds 64 bits (in two fields) to the vcpu that need to be synced before every vcpu run in nvhe and protected modes. However, it isolates the hypervisor from the host, which allows us to use pmu in protected mode in a subsequent patch. No visible side effects in behavior intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220510095710.148178-4-tabba@google.com
|
#
f1f0c0cf |
|
28-Apr-2022 |
Alexandru Elisei <alexandru.elisei@arm.com> |
KVM: arm64: Don't BUG_ON() if emulated register table is unsorted To emulate a register access, KVM uses a table of registers sorted by register encoding to speed up queries using binary search. When Linux boots, KVM checks that the table is sorted and uses a BUG_ON() statement to let the user know if it's not. The unfortunate side effect is that an unsorted sysreg table brings down the whole kernel, not just KVM, even though the rest of the kernel can function just fine without KVM. To make matters worse, on machines which lack a serial console, the user is left pondering why the machine is taking so long to boot. Improve this situation by returning an error from kvm_arch_init() if the sysreg tables are not in the correct order. The machine is still very much usable for the user, with the exception of virtualization, who can now easily determine what went wrong. A minor typo has also been corrected in the check_sysreg_table() function. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220428103405.70884-2-alexandru.elisei@arm.com
|
#
bfbab445 |
|
03-May-2022 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Implement PSCI SYSTEM_SUSPEND ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows software to request that a system be placed in the deepest possible low-power state. Effectively, software can use this to suspend itself to RAM. Unfortunately, there really is no good way to implement a system-wide PSCI call in KVM. Any precondition checks done in the kernel will need to be repeated by userspace since there is no good way to protect a critical section that spans an exit to userspace. SYSTEM_RESET and SYSTEM_OFF are equally plagued by this issue, although no users have seemingly cared for the relatively long time these calls have been supported. The solution is to just make the whole implementation userspace's problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND. Additionally, add a CAP to get buy-in from userspace for this new exit type. Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in. If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide explicit documentation of userspace's responsibilites for the exit and point to the PSCI specification to describe the actual PSCI call. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
|
#
7b33a09d |
|
03-May-2022 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Add support for userspace to suspend a vCPU Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU is in a suspended state. In the suspended state the vCPU will block until a wakeup event (pending interrupt) is recognized. Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to userspace that KVM has recognized one such wakeup event. It is the responsibility of userspace to then make the vCPU runnable, or leave it suspended until the next wakeup event. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
|
#
3fdd0459 |
|
03-May-2022 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Return a value from check_vcpu_requests() A subsequent change to KVM will introduce a vCPU request that could result in an exit to userspace. Change check_vcpu_requests() to return a value and document the function. Unconditionally return 1 for now. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-6-oupton@google.com
|
#
1c6219e3 |
|
03-May-2022 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Rename the KVM_REQ_SLEEP handler The naming of the kvm_req_sleep function is confusing: the function itself sleeps the vCPU, it does not request such an event. Rename the function to make its purpose more clear. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-5-oupton@google.com
|
#
b171f9bb |
|
03-May-2022 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Track vCPU power state using MP state values A subsequent change to KVM will add support for additional power states. Store the MP state by value rather than keeping track of it as a boolean. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-4-oupton@google.com
|
#
1e579429 |
|
03-May-2022 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Dedupe vCPU power off helpers vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the former and replace all callsites to the latter. No functional change intended. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-3-oupton@google.com
|
#
05714cab |
|
02-May-2022 |
Raghavendra Rao Ananta <rananta@google.com> |
KVM: arm64: Setup a framework for hypercall bitmap firmware registers KVM regularly introduces new hypercall services to the guests without any consent from the userspace. This means, the guests can observe hypercall services in and out as they migrate across various host kernel versions. This could be a major problem if the guest discovered a hypercall, started using it, and after getting migrated to an older kernel realizes that it's no longer available. Depending on how the guest handles the change, there's a potential chance that the guest would just panic. As a result, there's a need for the userspace to elect the services that it wishes the guest to discover. It can elect these services based on the kernels spread across its (migration) fleet. To remedy this, extend the existing firmware pseudo-registers, such as KVM_REG_ARM_PSCI_VERSION, but by creating a new COPROC register space for all the hypercall services available. These firmware registers are categorized based on the service call owners, but unlike the existing firmware pseudo-registers, they hold the features supported in the form of a bitmap. During the VM initialization, the registers are set to upper-limit of the features supported by the corresponding registers. It's expected that the VMMs discover the features provided by each register via GET_ONE_REG, and write back the desired values using SET_ONE_REG. KVM allows this modification only until the VM has started. Some of the standard features are not mapped to any bits of the registers. But since they can recreate the original problem of making it available without userspace's consent, they need to be explicitly added to the case-list in kvm_hvc_call_default_allowed(). Any function-id that's not enabled via the bitmap, or not listed in kvm_hvc_call_default_allowed, will be returned as SMCCC_RET_NOT_SUPPORTED to the guest. Older userspace code can simply ignore the feature and the hypercall services will be exposed unconditionally to the guests, thus ensuring backward compatibility. In this patch, the framework adds the register only for ARM's standard secure services (owner value 4). Currently, this includes support only for ARM True Random Number Generator (TRNG) service, with bit-0 of the register representing mandatory features of v1.0. Other services are momentarily added in the upcoming patches. Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Reviewed-by: Gavin Shan <gshan@redhat.com> [maz: reduced the scope of some helpers, tidy-up bitmap max values, dropped error-only fast path] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220502233853.1233742-3-rananta@google.com
|
#
f502cc56 |
|
04-Mar-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: Add max_vcpus field in common 'struct kvm' For TDX guests, the maximum number of vcpus needs to be specified when the TDX guest VM is initialized (creating the TDX data corresponding to TDX guest) before creating vcpu. It needs to record the maximum number of vcpus on VM creation (KVM_CREATE_VM) and return error if the number of vcpus exceeds it Because there is already max_vcpu member in arm64 struct kvm_arch, move it to common struct kvm and initialize it to KVM_MAX_VCPUS before kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Message-Id: <e53234cdee6a92357d06c80c03d77c19cdefb804.1646422845.git.isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ce335431 |
|
20-Apr-2022 |
Kalesh Singh <kaleshsingh@google.com> |
KVM: arm64: Add guard pages for KVM nVHE hypervisor stack Map the stack pages in the flexible private VA range and allocate guard pages below the stack as unbacked VA space. The stack is aligned so that any valid stack address has PAGE_SHIFT bit as 1 - this is used for overflow detection (implemented in a subsequent patch in the series). Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220420214317.3303360-4-kaleshsingh@google.com
|
#
89f5074c |
|
19-Apr-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Handle blocking WFIT instruction When trapping a blocking WFIT instruction, take it into account when computing the deadline of the background timer. The state is tracked with a new vcpu flag, and is gated by a new CPU capability, which isn't currently enabled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220419182755.601427-6-maz@kernel.org
|
#
b57de4ff |
|
19-Apr-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Simplify kvm_cpu_has_pending_timer() kvm_cpu_has_pending_timer() ends up checking all the possible timers for a wake-up cause. However, we already check for pending interrupts whenever we try to wake-up a vcpu, including the timer interrupts. Obviously, doing the same work twice is once too many. Reduce this helper to almost nothing, but keep it around, as we are going to make use of it soon. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220419182755.601427-4-maz@kernel.org
|
#
2e403167 |
|
13-May-2022 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Don't hypercall before EL2 init Will reported the following splat when running with Protected KVM enabled: [ 2.427181] ------------[ cut here ]------------ [ 2.427668] WARNING: CPU: 3 PID: 1 at arch/arm64/kvm/mmu.c:489 __create_hyp_private_mapping+0x118/0x1ac [ 2.428424] Modules linked in: [ 2.429040] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc2-00084-g8635adc4efc7 #1 [ 2.429589] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 2.430286] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.430734] pc : __create_hyp_private_mapping+0x118/0x1ac [ 2.431091] lr : create_hyp_exec_mappings+0x40/0x80 [ 2.431377] sp : ffff80000803baf0 [ 2.431597] x29: ffff80000803bb00 x28: 0000000000000000 x27: 0000000000000000 [ 2.432156] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 [ 2.432561] x23: ffffcd96c343b000 x22: 0000000000000000 x21: ffff80000803bb40 [ 2.433004] x20: 0000000000000004 x19: 0000000000001800 x18: 0000000000000000 [ 2.433343] x17: 0003e68cf7efdd70 x16: 0000000000000004 x15: fffffc81f602a2c8 [ 2.434053] x14: ffffdf8380000000 x13: ffffcd9573200000 x12: ffffcd96c343b000 [ 2.434401] x11: 0000000000000004 x10: ffffcd96c1738000 x9 : 0000000000000004 [ 2.434812] x8 : ffff80000803bb40 x7 : 7f7f7f7f7f7f7f7f x6 : 544f422effff306b [ 2.435136] x5 : 000000008020001e x4 : ffff207d80a88c00 x3 : 0000000000000005 [ 2.435480] x2 : 0000000000001800 x1 : 000000014f4ab800 x0 : 000000000badca11 [ 2.436149] Call trace: [ 2.436600] __create_hyp_private_mapping+0x118/0x1ac [ 2.437576] create_hyp_exec_mappings+0x40/0x80 [ 2.438180] kvm_init_vector_slots+0x180/0x194 [ 2.458941] kvm_arch_init+0x80/0x274 [ 2.459220] kvm_init+0x48/0x354 [ 2.459416] arm_init+0x20/0x2c [ 2.459601] do_one_initcall+0xbc/0x238 [ 2.459809] do_initcall_level+0x94/0xb4 [ 2.460043] do_initcalls+0x54/0x94 [ 2.460228] do_basic_setup+0x1c/0x28 [ 2.460407] kernel_init_freeable+0x110/0x178 [ 2.460610] kernel_init+0x20/0x1a0 [ 2.460817] ret_from_fork+0x10/0x20 [ 2.461274] ---[ end trace 0000000000000000 ]--- Indeed, the Protected KVM mode promotes __create_hyp_private_mapping() to a hypercall as EL1 no longer has access to the hypervisor's stage-1 page-table. However, the call from kvm_init_vector_slots() happens after pKVM has been initialized on the primary CPU, but before it has been initialized on secondaries. As such, if the KVM initcall procedure is migrated from one CPU to another in this window, the hypercall may end up running on a CPU for which EL2 has not been initialized. Fortunately, the pKVM hypervisor doesn't rely on the host to re-map the vectors in the private range, so the hypercall in question is in fact superfluous. Skip it when pKVM is enabled. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> [maz: simplified the checks slightly] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220513092607.35233-1-qperret@google.com
|
#
18f3976f |
|
24-Apr-2022 |
Alexandru Elisei <alexandru.elisei@arm.com> |
KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_high When userspace is debugging a VM, the kvm_debug_exit_arch part of the kvm_run struct contains arm64 specific debug information: the ESR_EL2 value, encoded in the field "hsr", and the address of the instruction that caused the exception, encoded in the field "far". Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately kvm_debug_exit_arch.hsr cannot be changed because that would change the memory layout of the struct on big endian machines: Current layout: | Layout with "hsr" extended to 64 bits: | offset 0: ESR_EL2[31:0] (hsr) | offset 0: ESR_EL2[61:32] (hsr[61:32]) offset 4: padding | offset 4: ESR_EL2[31:0] (hsr[31:0]) offset 8: FAR_EL2[61:0] (far) | offset 8: FAR_EL2[61:0] (far) which breaks existing code. The padding is inserted by the compiler because the "far" field must be aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1], page 18), and the struct itself must be aligned to 8 bytes (the struct must be aligned to the maximum alignment of its fields - aapcs64, page 18), which means that "hsr" must be aligned to 8 bytes as it is the first field in the struct. To avoid changing the struct size and layout for the existing fields, add a new field, "hsr_high", which replaces the existing padding. "hsr_high" will be used to hold the ESR_EL2[61:32] bits of the register. The memory layout, both on big and little endian machine, becomes: offset 0: ESR_EL2[31:0] (hsr) offset 4: ESR_EL2[61:32] (hsr_high) offset 8: FAR_EL2[61:0] (far) The padding that the compiler inserts for the current struct layout is unitialized. To prevent an updated userspace running on an old kernel mistaking the padding for a valid "hsr_high" value, add a new flag, KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that "hsr_high" holds a valid ESR_EL2[61:32] value. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
21ea4578 |
|
18-Mar-2022 |
Julia Lawall <Julia.Lawall@inria.fr> |
KVM: arm64: fix typos in comments Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220318103729.157574-24-Julia.Lawall@inria.fr
|
#
06394531 |
|
11-Mar-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Generalise VM features into a set of flags We currently deal with a set of booleans for VM features, while they could be better represented as set of flags contained in an unsigned long, similarily to what we are doing on the CPU side. Signed-off-by: Marc Zyngier <maz@kernel.org> [Oliver: Flag-ify the 'ran_once' boolean] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220311174001.605719-2-oupton@google.com
|
#
f7659f8b |
|
03-Mar-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Only open the interrupt window on exit due to an interrupt Now that we properly account for interrupts taken whilst the guest was running, it becomes obvious that there is no need to open this accounting window if we didn't exit because of an interrupt. This saves a number of system register accesses and other barriers if we exited for any other reason (such as a trap, for example). Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20220304135914.1464721-1-maz@kernel.org
|
#
583cda1b |
|
27-Jan-2022 |
Alexandru Elisei <alexandru.elisei@arm.com> |
KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU device ioctl. If the VCPU is scheduled on a physical CPU which has a different PMU, the perf events needed to emulate a guest PMU won't be scheduled in and the guest performance counters will stop counting. Treat it as an userspace error and refuse to run the VCPU in this situation. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
|
#
5177fe91 |
|
27-Jan-2022 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Do not change the PMU event filter after a VCPU has run Userspace can specify which events a guest is allowed to use with the KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be identified by a guest from reading the PMCEID{0,1}_EL0 registers. Changing the PMU event filter after a VCPU has run can cause reads of the registers performed before the filter is changed to return different values than reads performed with the new event filter in place. The architecture defines the two registers as read-only, and this behaviour contradicts that. Keep track when the first VCPU has run and deny changes to the PMU event filter to prevent this from happening. Signed-off-by: Marc Zyngier <maz@kernel.org> [ Alexandru E: Added commit message, updated ioctl documentation ] Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
|
#
100b4f09 |
|
21-Nov-2021 |
Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> |
KVM: arm64: Make active_vmids invalid on vCPU schedule out Like ASID allocator, we copy the active_vmids into the reserved_vmids on a rollover. But it's unlikely that every CPU will have a vCPU as current task and we may end up unnecessarily reserving the VMID space. Hence, set active_vmids to an invalid one when scheduling out a vCPU. Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211122121844.867-5-shameerali.kolothum.thodi@huawei.com
|
#
3248136b |
|
21-Nov-2021 |
Julien Grall <julien.grall@arm.com> |
KVM: arm64: Align the VMID allocation with the arm64 ASID At the moment, the VMID algorithm will send an SGI to all the CPUs to force an exit and then broadcast a full TLB flush and I-Cache invalidation. This patch uses the new VMID allocator. The benefits are: Â Â - Aligns with arm64 ASID algorithm. Â Â - CPUs are not forced to exit at roll-over. Instead, the VMID will be marked reserved and context invalidation is broadcasted. This will reduce the IPIs traffic. Â - More flexible to add support for pinned KVM VMIDs in the future. Â Â With the new algo, the code is now adapted: Â Â - The call to update_vmid() will be done with preemption disabled as the new algo requires to store information per-CPU. Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211122121844.867-4-shameerali.kolothum.thodi@huawei.com
|
#
def8c222 |
|
23-Feb-2022 |
Vladimir Murzin <vladimir.murzin@arm.com> |
arm64: Add support of PAuth QARMA3 architected algorithm QARMA3 is relaxed version of the QARMA5 algorithm which expected to reduce the latency of calculation while still delivering a suitable level of security. Support for QARMA3 can be discovered via ID_AA64ISAR2_EL1 APA3, bits [15:12] Indicates whether the QARMA3 algorithm is implemented in the PE for address authentication in AArch64 state. GPA3, bits [11:8] Indicates whether the QARMA3 algorithm is implemented in the PE for generic code authentication in AArch64 state. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220224124952.119612-4-vladimir.murzin@arm.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
5bdf3437 |
|
16-Nov-2021 |
James Morse <james.morse@arm.com> |
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A CPUs vulnerable to Spectre-BHB either need to make an SMC-CC firmware call from the vectors, or run a sequence of branches. This gets added to the hyp vectors. If there is no support for arch-workaround-1 in firmware, the indirect vector will be used. kvm_init_vector_slots() only initialises the two indirect slots if the platform is vulnerable to Spectre-v3a. pKVM's hyp_map_vectors() only initialises __hyp_bp_vect_base if the platform is vulnerable to Spectre-v3a. As there are about to more users of the indirect vectors, ensure their entries in hyp_spectre_vector_selector[] are always initialised, and __hyp_bp_vect_base defaults to the regular VA mapping. The Spectre-v3a check is moved to a helper kvm_system_needs_idmapped_vectors(), and merged with the code that creates the hyp mappings. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com>
|
#
8cfe148a |
|
01-Feb-2022 |
Mark Rutland <mark.rutland@arm.com> |
kvm/arm64: rework guest entry logic In kvm_arch_vcpu_ioctl_run() we enter an RCU extended quiescent state (EQS) by calling guest_enter_irqoff(), and unmasked IRQs prior to exiting the EQS by calling guest_exit(). As the IRQ entry code will not wake RCU in this case, we may run the core IRQ code and IRQ handler without RCU watching, leading to various potential problems. Additionally, we do not inform lockdep or tracing that interrupts will be enabled during guest execution, which caan lead to misleading traces and warnings that interrupts have been enabled for overly-long periods. This patch fixes these issues by using the new timing and context entry/exit helpers to ensure that interrupts are handled during guest vtime but with RCU watching, with a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); Since instrumentation may make use of RCU, we must also ensure that no instrumented code is run during the EQS. I've split out the critical section into a new kvm_arm_enter_exit_vcpu() helper which is marked noinstr. Fixes: 1b3d546daf85ed2b ("arm/arm64: KVM: Properly account for guest CPU time") Reported-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Message-Id: <20220201132926.3301912-3-mark.rutland@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
52b28657 |
|
15-Dec-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: pkvm: Unshare guest structs during teardown Make use of the newly introduced unshare hypercall during guest teardown to unmap guest-related data structures from the hyp stage-1. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-15-qperret@google.com
|
#
3f868e14 |
|
15-Dec-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Introduce kvm_share_hyp() The create_hyp_mappings() function can currently be called at any point in time. However, its behaviour in protected mode changes widely depending on when it is being called. Prior to KVM init, it is used to create the temporary page-table used to bring-up the hypervisor, and later on it is transparently turned into a 'share' hypercall when the kernel has lost control over the hypervisor stage-1. In order to prepare the ground for also unsharing pages with the hypervisor during guest teardown, introduce a kvm_share_hyp() function to make it clear in which places a share hypercall should be expected, as we will soon need a matching unshare hypercall in all those places. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211215161232.1480836-7-qperret@google.com
|
#
d92a5d1c |
|
08-Oct-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Add helpers to wake/query blocking vCPU Add helpers to wake and query a blocking vCPU. In addition to providing nice names, the helpers reduce the probability of KVM neglecting to use kvm_arch_vcpu_get_wait(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-20-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
91b99ea7 |
|
08-Oct-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt() Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting the actual "block" sequences into a separate helper (to be named kvm_vcpu_block()). x86 will use the standalone block-only path to handle non-halt cases where the vCPU is not runnable. Rename block_ns to halt_ns to match the new function name. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6109c5a6 |
|
08-Oct-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Move vGIC v4 handling for WFI out arch callback hook Move the put and reload of the vGIC out of the block/unblock callbacks and into a dedicated WFI helper. Functionally, this is nearly a nop as the block hook is called at the very beginning of kvm_vcpu_block(), and the only code in kvm_vcpu_block() after the unblock hook is to update the halt-polling controls, i.e. can only affect the next WFI. Back when the arch (un)blocking hooks were added by commits 3217f7c25bca ("KVM: Add kvm_arch_vcpu_{un}blocking callbacks) and d35268da6687 ("arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block"), the hooks were invoked only when KVM was about to "block", i.e. schedule out the vCPU. The use case at the time was to schedule a timer in the host based on the earliest timer in the guest in order to wake the blocking vCPU when the emulated guest timer fired. Commit accb99bcd0ca ("KVM: arm/arm64: Simplify bg_timer programming") reworked the timer logic to be even more precise, by waiting until the vCPU was actually scheduled out, and so move the timer logic from the (un)blocking hooks to vcpu_load/put. In the meantime, the hooks gained usage for enabling vGIC v4 doorbells in commit df9ba95993b9 ("KVM: arm/arm64: GICv4: Use the doorbell interrupt as an unblocking source"), and added related logic for the VMCR in commit 5eeaf10eec39 ("KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block"). Finally, commit 07ab0f8d9a12 ("KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence") hoisted the (un)blocking hooks so that they wrapped KVM's halt-polling logic in addition to the core "block" logic. In other words, the original need for arch hooks to take action _only_ in the block path is long since gone. Cc: Oliver Upton <oupton@google.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
46808a4c |
|
16-Nov-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: Use 'unsigned long' as kvm_for_each_vcpu()'s index Everywhere we use kvm_for_each_vpcu(), we use an int as the vcpu index. Unfortunately, we're about to move rework the iterator, which requires this to be upgrade to an unsigned long. Let's bite the bullet and repaint all of it in one go. Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-7-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
27592ae8 |
|
16-Nov-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: Move wiping of the kvm->vcpus array to common code All architectures have similar loops iterating over the vcpus, freeing one vcpu at a time, and eventually wiping the reference off the vcpus array. They are also inconsistently taking the kvm->lock mutex when wiping the references from the array. Make this code common, which will simplify further changes. The locking is dropped altogether, as this should only be called when there is no further references on the kvm structure. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Message-Id: <20211116160403.4074052-2-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cc5705fb |
|
13-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Drop vcpu->arch.has_run_once for vcpu->pid With the transition to kvm_arch_vcpu_run_pid_change() to handle the "run once" activities, it becomes obvious that has_run_once is now an exact shadow of vcpu->pid. Replace vcpu->arch.has_run_once with a new vcpu_has_run_once() helper that directly checks for vcpu->pid, and get rid of the now unused field. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
b5aa368a |
|
14-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Merge kvm_arch_vcpu_run_pid_change() and kvm_vcpu_first_run_init() The kvm_arch_vcpu_run_pid_change() helper gets called on each PID change. The kvm_vcpu_first_run_init() helper gets run on the... first run(!) of a vcpu. As it turns out, the first run of a vcpu also triggers a PID change event (vcpu->pid is initially NULL). Use this property to merge these two helpers and get rid of another arm64-specific oddity. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
1408e73d |
|
13-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Restructure the point where has_run_once is advertised Restructure kvm_vcpu_first_run_init() to set the has_run_once flag after having completed all the "run once" activities. This includes moving the flip of the userspace irqchip static key to a point where nothing can fail. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
052f064d |
|
14-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Move kvm_arch_vcpu_run_pid_change() out of line Having kvm_arch_vcpu_run_pid_change() inline doesn't bring anything to the table. Move it next to kvm_vcpu_first_run_init(), which will be convenient for what is next to come. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
af9a0e21 |
|
21-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE We currently have to maintain a mapping the thread_info structure at EL2 in order to be able to check the TIF_FOREIGN_FPSTATE flag. In order to eventually get rid of this, start with a vcpu flag that shadows the thread flag on each entry into the hypervisor. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
17ed14eb |
|
10-Nov-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Drop perf.c and fold its tiny bits of code into arm.c Call KVM's (un)register perf callbacks helpers directly from arm.c and delete perf.c No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211111020738.2512932-17-seanjc@google.com
|
#
e1bfc245 |
|
10-Nov-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: Move x86's perf guest info callbacks to generic KVM Move x86's perf guest callbacks into common KVM, as they are semantically identical to arm64's callbacks (the only other such KVM callbacks). arm64 will convert to the common versions in a future patch. Implement the necessary arm64 arch hooks now to avoid having to provide stubs or a temporary #define (from x86) to avoid arm64 compilation errors when CONFIG_GUEST_PERF_EVENTS=y. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211111020738.2512932-13-seanjc@google.com
|
#
f60a00d7 |
|
16-Nov-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: arm64: Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() Generally, it doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs. Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs depending on whether it is a system-wide ioctl or a per-VM one. Previously, KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() which is what gets returned by system-wide KVM_CAP_MAX_VCPUS. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20211116163443.88707-2-vkuznets@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
08e873cb |
|
04-Nov-2021 |
YueHaibing <yuehaibing@huawei.com> |
KVM: arm64: Change the return type of kvm_vcpu_preferred_target() kvm_vcpu_preferred_target() always return 0 because kvm_target_cpu() never returns a negative error code. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105011500.16280-1-yuehaibing@huawei.com
|
#
115bae92 |
|
07-Sep-2021 |
Jia He <justin.he@arm.com> |
KVM: arm64: Add memcg accounting to KVM allocations Inspired by commit 254272ce6505 ("kvm: x86: Add memcg accounting to KVM allocations"), it would be better to make arm64 KVM consistent with common kvm codes. The memory allocations of VM scope should be charged into VM process cgroup, hence change GFP_KERNEL to GFP_KERNEL_ACCOUNT. There remain a few cases since these allocations are global, not in VM scope. Signed-off-by: Jia He <justin.he@arm.com> Reviewed-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210907123112.10232-3-justin.he@arm.com
|
#
2a0c3433 |
|
10-Oct-2021 |
Fuad Tabba <tabba@google.com> |
KVM: arm64: Initialize trap registers for protected VMs Protected VMs have more restricted features that need to be trapped. Moreover, the host should not be trusted to set the appropriate trapping registers and their values. Initialize the trapping registers, i.e., hcr_el2, mdcr_el2, and cptr_el2 at EL2 for protected guests, based on the values of the guest's feature id registers. No functional change intended as trap handlers introduced in the previous patch are still not hooked in to the guest exit handlers. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-9-tabba@google.com
|
#
6c30bfb1 |
|
10-Oct-2021 |
Fuad Tabba <tabba@google.com> |
KVM: arm64: Add handlers for protected VM System Registers Add system register handlers for protected VMs. These cover Sys64 registers (including feature id registers), and debug. No functional change intended as these are not hooked in yet to the guest exit handlers introduced earlier. So when trapping is triggered, the exit handlers let the host handle it, as before. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211010145636.1950948-8-tabba@google.com
|
#
b6a68b97 |
|
01-Oct-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Allow KVM to be disabled from the command line Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Reviewed-by: David Brazdil <dbrazdil@google.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Andrew Scull <ascull@google.com> Link: https://lore.kernel.org/r/20211001170553.3062988-1-maz@kernel.org
|
#
2f2e1a50 |
|
08-Oct-2021 |
Will Deacon <will@kernel.org> |
KVM: arm64: Propagate errors from __pkvm_prot_finalize hypercall If the __pkvm_prot_finalize hypercall returns an error, we WARN but fail to propagate the failure code back to kvm_arch_init(). Pass a pointer to a zero-initialised return variable so that failure to finalise the pKVM protections on a host CPU can be reported back to KVM. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211008135839.1193-5-will@kernel.org
|
#
8579a185 |
|
08-Oct-2021 |
Will Deacon <will@kernel.org> |
KVM: arm64: Reject stub hypercalls after pKVM has been initialised The stub hypercalls provide mechanisms to reset and replace the EL2 code, so uninstall them once pKVM has been initialised in order to ensure the integrity of the hypervisor code. To ensure pKVM initialisation remains functional, split cpu_hyp_reinit() into two helper functions to separate usage of the stub from usage of pkvm hypercalls either side of __pkvm_init on the boot CPU. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211008135839.1193-4-will@kernel.org
|
#
78b497f2 |
|
03-Sep-2021 |
Juergen Gross <jgross@suse.com> |
kvm: use kvfree() in kvm_arch_free_vm() By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can use the common variant. This can be accomplished by adding another macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now. Further simplification can be achieved by adding __kvm_arch_free_vm() doing the common part. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-Id: <20210903130808.30142-5-jgross@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cd496228 |
|
17-Aug-2021 |
Fuad Tabba <tabba@google.com> |
KVM: arm64: Track value of cptr_el2 in struct kvm_vcpu_arch Track the baseline guest value for cptr_el2 in struct kvm_vcpu_arch, similar to the other registers that control traps. Use this value when setting cptr_el2 for the guest. Currently this value is unchanged (CPTR_EL2_DEFAULT), but future patches will set trapping bits based on features supported for the guest. No functional change intended. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210817081134.2918285-9-tabba@google.com
|
#
cf364e08 |
|
05-Aug-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Upgrade VMID accesses to {READ,WRITE}_ONCE Since TLB invalidation can run in parallel with VMID allocation, we need to be careful and avoid any sort of load/store tearing. Use {READ,WRITE}_ONCE consistently to avoid any surprise. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jade Alglave <jade.alglave@arm.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20210806113109.2475-6-will@kernel.org
|
#
6caa5812 |
|
02-Aug-2021 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Use generic KVM xfer to guest work function Clean up handling of checks for pending work by switching to the generic infrastructure to do so. We pick up handling for TIF_NOTIFY_RESUME from this switch, meaning that task work will be correctly handled. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-4-oupton@google.com
|
#
fe5161d2 |
|
02-Aug-2021 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Record number of signal exits as a vCPU stat Most other architectures that implement KVM record a statistic indicating the number of times a vCPU has exited due to a pending signal. Add support for that stat to arm64. Reviewed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210802192809.1851010-2-oupton@google.com
|
#
6826c684 |
|
18-Aug-2021 |
Oliver Upton <oupton@google.com> |
KVM: arm64: Handle PSCI resets before userspace touches vCPU state The CPU_ON PSCI call takes a payload that KVM uses to configure a destination vCPU to run. This payload is non-architectural state and not exposed through any existing UAPI. Effectively, we have a race between CPU_ON and userspace saving/restoring a guest: if the target vCPU isn't ran again before the VMM saves its state, the requested PC and context ID are lost. When restored, the target vCPU will be runnable and start executing at its old PC. We can avoid this race by making sure the reset payload is serviced before userspace can access a vCPU's state. Fixes: 358b28f09f0a ("arm/arm64: KVM: Allow a VCPU to fully reset itself") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210818202133.1106786-3-oupton@google.com
|
#
9329752b |
|
11-Aug-2021 |
Anshuman Khandual <anshuman.khandual@arm.com> |
KVM: arm64: Drop unused REQUIRES_VIRT This seems like a residue from the past. REQUIRES_VIRT is no more available . Hence it can just be dropped along with the related code section. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-6-git-send-email-anshuman.khandual@arm.com
|
#
6b7982fe |
|
11-Aug-2021 |
Anshuman Khandual <anshuman.khandual@arm.com> |
KVM: arm64: Drop check_kvm_target_cpu() based percpu probe kvm_target_cpu() never returns a negative error code, so check_kvm_target() would never have 'ret' filled with a negative error code. Hence the percpu probe via check_kvm_target_cpu() does not make sense as its never going to find an unsupported CPU, forcing kvm_arch_init() to exit early. Hence lets just drop this percpu probe (and also check_kvm_target_cpu()) altogether. While here, this also changes kvm_target_cpu() return type to a u32, making it explicit that an error code will not be returned from this function. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-5-git-send-email-anshuman.khandual@arm.com
|
#
bf249d9e |
|
11-Aug-2021 |
Anshuman Khandual <anshuman.khandual@arm.com> |
KVM: arm64: Drop init_common_resources() Could do without this additional indirection via init_common_resources() by just calling kvm_set_ipa_limit() directly instead. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1628744994-16623-4-git-send-email-anshuman.khandual@arm.com
|
#
ad0e0139 |
|
09-Aug-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Remove __pkvm_mark_hyp Now that we mark memory owned by the hypervisor in the host stage-2 during __pkvm_init(), we no longer need to rely on the host to explicitly mark the hyp sections later on. Remove the __pkvm_mark_hyp() hypercall altogether. Signed-off-by: Quentin Perret <qperret@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210809152448.1810400-19-qperret@google.com
|
#
47e6223c |
|
02-Aug-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Unregister HYP sections from kmemleak in protected mode Booting a KVM host in protected mode with kmemleak quickly results in a pretty bad crash, as kmemleak doesn't know that the HYP sections have been taken away. This is specially true for the BSS section, which is part of the kernel BSS section and registered at boot time by kmemleak itself. Unregister the HYP part of the BSS before making that section HYP-private. The rest of the HYP-specific data is obtained via the page allocator or lives in other sections, none of which is subjected to kmemleak. Fixes: 90134ac9cabb ("KVM: arm64: Protect the .hyp sections from the host") Reviewed-by: Quentin Perret <qperret@google.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 5.13 Link: https://lore.kernel.org/r/20210802123830.2195174-3-maz@kernel.org
|
#
c4d7c518 |
|
29-Jul-2021 |
Steven Price <steven.price@arm.com> |
KVM: arm64: Fix race when enabling KVM_ARM_CAP_MTE When enabling KVM_CAP_ARM_MTE the ioctl checks that there are no VCPUs created to ensure that the capability is enabled before the VM is running. However no locks are held at that point so it is (theoretically) possible for another thread in the VMM to create VCPUs between the check and actually setting mte_enabled. Close the race by taking kvm->lock. Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Fixes: 673638f434ee ("KVM: arm64: Expose KVM_ARM_CAP_MTE") Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210729160036.20433-1-steven.price@arm.com
|
#
f0376edb |
|
20-Jun-2021 |
Steven Price <steven.price@arm.com> |
KVM: arm64: Add ioctl to fetch/store tags in a guest The VMM may not wish to have it's own mapping of guest memory mapped with PROT_MTE because this causes problems if the VMM has tag checking enabled (the guest controls the tags in physical RAM and it's unlikely the tags are correct for the VMM). Instead add a new ioctl which allows the VMM to easily read/write the tags from guest memory, allowing the VMM's mapping to be non-PROT_MTE while the VMM can still read/write the tags for the purpose of migration. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-6-steven.price@arm.com
|
#
673638f4 |
|
20-Jun-2021 |
Steven Price <steven.price@arm.com> |
KVM: arm64: Expose KVM_ARM_CAP_MTE It's now safe for the VMM to enable MTE in a guest, so expose the capability to user space. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210621111716.37157-5-steven.price@arm.com
|
#
d0c94c49 |
|
03-Jun-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Restore PMU configuration on first run Restoring a guest with an active virtual PMU results in no perf counters being instanciated on the host side. Not quite what you'd expect from a restore. In order to fix this, force a writeback of PMCR_EL0 on the first run of a vcpu (using a new request so that it happens once the vcpu has been loaded). This will in turn create all the host-side counters that were missing. Reported-by: Jinank Jain <jinankj@amazon.de> Tested-by: Jinank Jain <jinankj@amazon.de> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/87wnrbylxv.wl-maz@kernel.org Link: https://lore.kernel.org/r/b53dfcf9bbc4db7f96154b1cd5188d72b9766358.camel@amazon.de
|
#
2f6a49bb |
|
08-Jun-2021 |
Will Deacon <will@kernel.org> |
KVM: arm64: Kill 32-bit vCPUs on systems with mismatched EL0 support If a vCPU is caught running 32-bit code on a system with mismatched support at EL0, then we should kill it. Acked-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210608180313.11502-4-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
fade9c2c |
|
24-May-2021 |
Fuad Tabba <tabba@google.com> |
arm64: Rename arm64-internal cache maintenance functions Although naming across the codebase isn't that consistent, it tends to follow certain patterns. Moreover, the term "flush" isn't defined in the Arm Architecture reference manual, and might be interpreted to mean clean, invalidate, or both for a cache. Rename arm64-internal functions to make the naming internally consistent, as well as making it consistent with the Arm ARM, by specifying whether it applies to the instruction, data, or both caches, whether the operation is a clean, invalidate, or both. Also specify which point the operation applies to, i.e., to the point of unification (PoU), coherency (PoC), or persistence (PoP). This commit applies the following sed transformation to all files under arch/arm64: "s/\b__flush_cache_range\b/caches_clean_inval_pou_macro/g;"\ "s/\b__flush_icache_range\b/caches_clean_inval_pou/g;"\ "s/\binvalidate_icache_range\b/icache_inval_pou/g;"\ "s/\b__flush_dcache_area\b/dcache_clean_inval_poc/g;"\ "s/\b__inval_dcache_area\b/dcache_inval_poc/g;"\ "s/__clean_dcache_area_poc\b/dcache_clean_poc/g;"\ "s/\b__clean_dcache_area_pop\b/dcache_clean_pop/g;"\ "s/\b__clean_dcache_area_pou\b/dcache_clean_pou/g;"\ "s/\b__flush_cache_user_range\b/caches_clean_inval_user_pou/g;"\ "s/\b__flush_icache_all\b/icache_inval_all_pou/g;" Note that __clean_dcache_area_poc is deliberately missing a word boundary check at the beginning in order to match the efistub symbols in image-vars.h. Also note that, despite its name, __flush_icache_range operates on both instruction and data caches. The name change here reflects that. No functional change intended. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20210524083001.2586635-19-tabba@google.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
e3e880bb |
|
26-May-2021 |
Zenghui Yu <yuzenghui@huawei.com> |
KVM: arm64: Resolve all pending PC updates before immediate exit Commit 26778aaa134a ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace") fixed the PC updating issue by forcing an explicit synchronisation of the exception state on vcpu exit to userspace. However, we forgot to take into account the case where immediate_exit is set by userspace and KVM_RUN will exit immediately. Fix it by resolving all pending PC updates before returning to userspace. Since __kvm_adjust_pc() relies on a loaded vcpu context, I moved the immediate_exit checking right after vcpu_load(). We will get some overhead if immediate_exit is true (which should hopefully be rare). Fixes: 26778aaa134a ("KVM: arm64: Commit pending PC adjustemnts before returning to userspace") Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210526141831.1662-1-yuzenghui@huawei.com Cc: stable@vger.kernel.org # 5.11
|
#
26778aaa |
|
06-May-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Commit pending PC adjustemnts before returning to userspace KVM currently updates PC (and the corresponding exception state) using a two phase approach: first by setting a set of flags, then by converting these flags into a state update when the vcpu is about to enter the guest. However, this creates a disconnect with userspace if the vcpu thread returns there with any exception/PC flag set. In this case, the exposed context is wrong, as userspace doesn't have access to these flags (they aren't architectural). It also means that these flags are preserved across a reset, which isn't expected. To solve this problem, force an explicit synchronisation of the exception state on vcpu exit to userspace. As an optimisation for nVHE systems, only perform this when there is something pending. Reported-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Tested-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org # 5.11
|
#
6c9dd6d2 |
|
02-Apr-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: constify kvm_arch_flush_remote_tlbs_memslot memslots are stored in RCU and there should be no need to change them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fa18aca9 |
|
01-Apr-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: aarch64: implement KVM_CAP_SET_GUEST_DEBUG2 Move KVM_GUESTDBG_VALID_MASK to kvm_host.h and use it to return the value of this capability. Compile tested only. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401135451.1004564-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
263d6287 |
|
07-Apr-2021 |
Alexandru Elisei <alexandru.elisei@arm.com> |
KVM: arm64: Initialize VCPU mdcr_el2 before loading it When a VCPU is created, the kvm_vcpu struct is initialized to zero in kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is set to a sensible value in kvm_arm_setup_debug() later in the run loop. The result is that KVM executes for a short time with MDCR_EL2 set to zero. This has several unintended consequences: * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM DDI 0487G.a, page D13-3820. The behavior specified by the architecture in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a value less than or equal to PMCR_EL0.N, which means that an unknown number of counters are now disabled by MDCR_EL2.HPME, which is zero. * The host configuration for the other debug features controlled by MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux doesn't use the other fields, but that might change in the future. Let's avoid both issues by initializing the VCPU's mdcr_el2 field in kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register has a consistent value after each vcpu_load(). Fixes: d5a21bcc2995 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions") Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210407144857.199746-3-alexandru.elisei@arm.com
|
#
3bf72569 |
|
08-Dec-2020 |
Jianyong Wu <jianyong.wu@arm.com> |
KVM: arm64: Add support for the KVM PTP service Implement the hypervisor side of the KVM PTP interface. The service offers wall time and cycle count from host to guest. The caller must specify whether they want the host's view of either the virtual or physical counter. Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201209060932.212364-7-jianyong.wu@arm.com
|
#
d2602bb4 |
|
05-Apr-2021 |
Suzuki K Poulose <suzuki.poulose@arm.com> |
KVM: arm64: Move SPE availability check to VCPU load At the moment, we check the availability of SPE on the given CPU (i.e, SPE is implemented and is allowed at the host) during every guest entry. This can be optimized a bit by moving the check to vcpu_load time and recording the availability of the feature on the current CPU via a new flag. This will also be useful for adding the TRBE support. Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Alexandru Elisei <Alexandru.Elisei@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405164307.1720226-7-suzuki.poulose@arm.com Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
|
#
52b9e265 |
|
05-Apr-2021 |
Wang Wensheng <wangwensheng4@huawei.com> |
KVM: arm64: Fix error return code in init_hyp_mode() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: eeeee7193df0 ("KVM: arm64: Bootstrap PSCI SMC handler in nVHE EL2") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210406121759.5407-1-wangwensheng4@huawei.com
|
#
b1306fef |
|
31-Mar-2021 |
Xu Jia <xujia39@huawei.com> |
KVM: arm64: Make symbol '_kvm_host_prot_finalize' static The sparse tool complains as follows: arch/arm64/kvm/arm.c:1900:6: warning: symbol '_kvm_host_prot_finalize' was not declared. Should it be static? This symbol is not used outside of arm.c, so this commit marks it static. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Jia <xujia39@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1617176179-31931-1-git-send-email-xujia39@huawei.com
|
#
7c419937 |
|
22-Mar-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Drop the CPU_FTR_REG_HYP_COPY infrastructure Now that the read_ctr macro has been specialised for nVHE, the whole CPU_FTR_REG_HYP_COPY infrastrcture looks completely overengineered. Simplify it by populating the two u64 quantities (MMFR0 and 1) that the hypervisor need. Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
90134ac9 |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Protect the .hyp sections from the host When KVM runs in nVHE protected mode, use the host stage 2 to unmap the hypervisor sections by marking them as owned by the hypervisor itself. The long-term goal is to ensure the EL2 code can remain robust regardless of the host's state, so this starts by making sure the host cannot e.g. write to the .hyp sections directly. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-39-qperret@google.com
|
#
1025c8c0 |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Wrap the host with a stage 2 When KVM runs in protected nVHE mode, make use of a stage 2 page-table to give the hypervisor some control over the host memory accesses. The host stage 2 is created lazily using large block mappings if possible, and will default to page mappings in absence of a better solution. >From this point on, memory accesses from the host to protected memory regions (e.g. not 'owned' by the host) are fatal and lead to hyp_panic(). Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-36-qperret@google.com
|
#
734864c1 |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Set host stage 2 using kvm_nvhe_init_params Move the registers relevant to host stage 2 enablement to kvm_nvhe_init_params to prepare the ground for enabling it in later patches. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-22-qperret@google.com
|
#
bfa79a80 |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Elevate hypervisor mappings creation at EL2 Previous commits have introduced infrastructure to enable the EL2 code to manage its own stage 1 mappings. However, this was preliminary work, and none of it is currently in use. Put all of this together by elevating the mapping creation at EL2 when memory protection is enabled. In this case, the host kernel running at EL1 still creates _temporary_ EL2 mappings, only used while initializing the hypervisor, but frees them right after. As such, all calls to create_hyp_mappings() after kvm init has finished turn into hypercalls, as the host now has no 'legal' way to modify the hypevisor page tables directly. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-19-qperret@google.com
|
#
bc1d2892 |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Factor out vector address calculation In order to re-map the guest vectors at EL2 when pKVM is enabled, refactor __kvm_vector_slot2idx() and kvm_init_vector_slot() to move all the address calculation logic in a static inline function. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-16-qperret@google.com
|
#
380e18ad |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Introduce a BSS section for use at Hyp Currently, the hyp code cannot make full use of a bss, as the kernel section is mapped read-only. While this mapping could simply be changed to read-write, it would intermingle even more the hyp and kernel state than they currently are. Instead, introduce a __hyp_bss section, that uses reserved pages, and create the appropriate RW hyp mappings during KVM init. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-8-qperret@google.com
|
#
9cc77581 |
|
19-Mar-2021 |
Quentin Perret <qperret@google.com> |
KVM: arm64: Initialize kvm_nvhe_init_params early Move the initialization of kvm_nvhe_init_params in a dedicated function that is run early, and only once during KVM init, rather than every time the KVM vectors are set and reset. This also opens the opportunity for the hypervisor to change the init structs during boot, hence simplifying the replacement of host-provided page-table by the one the hypervisor will create for itself. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210319100146.1149909-5-qperret@google.com
|
#
6e94095c |
|
11-Mar-2021 |
Daniel Kiss <daniel.kiss@arm.com> |
KVM: arm64: Enable SVE support for nVHE Now that KVM is equipped to deal with SVE on nVHE, remove the code preventing it from being used as well as the bits of documentation that were mentioning the incompatibility. Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Daniel Kiss <daniel.kiss@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
01dc9262 |
|
03-Mar-2021 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM It recently became apparent that the ARMv8 architecture has interesting rules regarding attributes being used when fetching instructions if the MMU is off at Stage-1. In this situation, the CPU is allowed to fetch from the PoC and allocate into the I-cache (unless the memory is mapped with the XN attribute at Stage-2). If we transpose this to vcpus sharing a single physical CPU, it is possible for a vcpu running with its MMU off to influence another vcpu running with its MMU on, as the latter is expected to fetch from the PoU (and self-patching code doesn't flush below that level). In order to solve this, reuse the vcpu-private TLB invalidation code to apply the same policy to the I-cache, nuking it every time the vcpu runs on a physical CPU that ran another vcpu of the same VM in the past. This involve renaming __kvm_tlb_flush_local_vmid() to __kvm_flush_cpu_context(), and inserting a local i-cache invalidation there. Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20210303164505.68492-1-maz@kernel.org
|
#
1945a067 |
|
08-Feb-2021 |
Marc Zyngier <maz@kernel.org> |
arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't that easy to understand, and it is likely that users would much prefer write "kvm-arm.mode=nvhe", or "...=protected". So here you go. This has the added advantage that we can now always honor the "kvm-arm.mode=protected" option, even when booting on a VHE system. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: David Brazdil <dbrazdil@google.com> Link: https://lore.kernel.org/r/20210208095732.3267263-18-maz@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
16174eea |
|
05-Jan-2021 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Set up .hyp.rodata ELF section We will need to recognize pointers in .rodata specific to hyp, so establish a .hyp.rodata ELF section. Merge it with the existing .hyp.data..ro_after_init as they are treated the same at runtime. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210105180541.65031-3-dbrazdil@google.com
|
#
e1663372 |
|
08-Jan-2021 |
Steven Price <steven.price@arm.com> |
KVM: arm64: Compute TPIDR_EL2 ignoring MTE tag KASAN in HW_TAGS mode will store MTE tags in the top byte of the pointer. When computing the offset for TPIDR_EL2 we don't want anything in the top byte, so remove the tag to ensure the computation is correct no matter what the tag. Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Signed-off-by: Steven Price <steven.price@arm.com> [maz: added comment] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210108161254.53674-1-steven.price@arm.com
|
#
44362a3c |
|
22-Dec-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Fix hyp_cpu_pm_{init,exit} __init annotation The __init annotations on hyp_cpu_pm_{init,exit} are obviously incorrect, and the build system shouts at you if you enable DEBUG_SECTION_MISMATCH. Nothing really bad happens as we never execute that code outside of the init context, but we can't label the callers as __int either, as kvm_init isn't __init itself. Oh well. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20201223120854.255347-1-maz@kernel.org
|
#
1c91f06d |
|
01-Dec-2020 |
Alexandru Elisei <alexandru.elisei@arm.com> |
KVM: arm64: Move double-checked lock to kvm_vgic_map_resources() kvm_vgic_map_resources() is called when a VCPU if first run and it maps all the VGIC MMIO regions. To prevent double-initialization, the VGIC uses the ready variable to keep track of the state of resources and the global KVM mutex to protect against concurrent accesses. After the lock is taken, the variable is checked again in case another VCPU took the lock between the current VCPU reading ready equals false and taking the lock. The double-checked lock pattern is spread across four different functions: in kvm_vcpu_first_run_init(), in kvm_vgic_map_resource() and in vgic_{v2,v3}_map_resources(), which makes it hard to reason about and introduces minor code duplication. Consolidate the checks in kvm_vgic_map_resources(), where the lock is taken. No functional change intended. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201201150157.223625-4-alexandru.elisei@arm.com
|
#
767c973f |
|
21-Dec-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Declutter host PSCI 0.1 handling Although there is nothing wrong with the current host PSCI relay implementation, we can clean it up and remove some of the helpers that do not improve the overall readability of the legacy PSCI 0.1 handling. Opportunity is taken to turn the bitmap into a set of booleans, and creative use of preprocessor macros make init and check more concise/readable. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
61fe0c37 |
|
08-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Minor cleanup of hyp variables used in host Small cleanup moving declarations of hyp-exported variables to kvm_host.h and using macros to avoid having to refer to them with kvm_nvhe_sym() in host. No functional change intended. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201208142452.87237-5-dbrazdil@google.com
|
#
ff367fe4 |
|
08-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Prevent use of invalid PSCI v0.1 function IDs PSCI driver exposes a struct containing the PSCI v0.1 function IDs configured in the DT. However, the struct does not convey the information whether these were set from DT or contain the default value zero. This could be a problem for PSCI proxy in KVM protected mode. Extend config passed to KVM with a bit mask with individual bits set depending on whether the corresponding function pointer in psci_ops is set, eg. set bit for PSCI_CPU_SUSPEND if psci_ops.cpu_suspend != NULL. Previously config was split into multiple global variables. Put everything into a single struct for convenience. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201208142452.87237-2-dbrazdil@google.com
|
#
f19f6644 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Fix EL2 mode availability checks With protected nVHE hyp code interception host's PSCI SMCs, the host starts seeing new CPUs boot in EL1 instead of EL2. The kernel logic that keeps track of the boot mode needs to be adjusted. Add a static key enabled if KVM protected mode initialization is successful. When the key is enabled, is_hyp_mode_available continues to report `true` because its users either treat it as a check whether KVM will be / was initialized, or whether stub HVCs can be made (eg. hibernate). is_hyp_mode_mismatched is changed to report `false` when the key is enabled. That's because all cores' modes matched at the point of KVM init and KVM will not allow cores not present at init to boot. That said, the function is never used after KVM is initialized. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-27-dbrazdil@google.com
|
#
fa8c3d65 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Keep nVHE EL2 vector installed KVM by default keeps the stub vector installed and installs the nVHE vector only briefly for init and later on demand. Change this policy to install the vector at init and then never uninstall it if the kernel was given the protected KVM command line parameter. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-25-dbrazdil@google.com
|
#
eeeee719 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Bootstrap PSCI SMC handler in nVHE EL2 Add a handler of PSCI SMCs in nVHE hyp code. The handler is initialized with the version used by the host's PSCI driver and the function IDs it was configured with. If the SMC function ID matches one of the configured PSCI calls (for v0.1) or falls into the PSCI function ID range (for v0.2+), the SMC is handled by the PSCI handler. For now, all SMCs return PSCI_RET_NOT_SUPPORTED. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-17-dbrazdil@google.com
|
#
94f5e8a4 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Create nVHE copy of cpu_logical_map When KVM starts validating host's PSCI requests, it will need to map MPIDR back to the CPU ID. To this end, copy cpu_logical_map into nVHE hyp memory when KVM is initialized. Only copy the information for CPUs that are online at the point of KVM initialization so that KVM rejects CPUs whose features were not checked against the finalized capabilities. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-15-dbrazdil@google.com
|
#
2d7bf218 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Add .hyp.data..ro_after_init ELF section Add rules for renaming the .data..ro_after_init ELF section in KVM nVHE object files to .hyp.data..ro_after_init, linking it into the kernel and mapping it in hyp at runtime. The section is RW to the host, then mapped RO in hyp. The expectation is that the host populates the variables in the section and they are never changed by hyp afterwards. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-13-dbrazdil@google.com
|
#
d3e1086c |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Init MAIR/TCR_EL2 from params struct MAIR_EL2 and TCR_EL2 are currently initialized from their _EL1 values. This will not work once KVM starts intercepting PSCI ON/SUSPEND SMCs and initializing EL2 state before EL1 state. Obtain the EL1 values during KVM init and store them in the init params struct. The struct will stay in memory and can be used when booting new cores. Take the opportunity to move copying the T0SZ value from idmap_t0sz in KVM init rather than in .hyp.idmap.text. This avoids the need for the idmap_t0sz symbol alias. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-12-dbrazdil@google.com
|
#
63fec243 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Move hyp-init params to a per-CPU struct Once we start initializing KVM on newly booted cores before the rest of the kernel, parameters to __do_hyp_init will need to be provided by EL2 rather than EL1. At that point it will not be possible to pass its three arguments directly because PSCI_CPU_ON only supports one context argument. Refactor __do_hyp_init to accept its parameters in a struct. This prepares the code for KVM booting cores as well as removes any limits on the number of __do_hyp_init arguments. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-11-dbrazdil@google.com
|
#
5be1d622 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Remove vector_ptr param of hyp-init KVM precomputes the hyp VA of __kvm_hyp_host_vector, essentially a constant (minus ASLR), before passing it to __kvm_hyp_init. Now that we have alternatives for converting kimg VA to hyp VA, replace this with computing the constant inside __kvm_hyp_init, thus removing the need for an argument. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-10-dbrazdil@google.com
|
#
3eb681fb |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Add ARM64_KVM_PROTECTED_MODE CPU capability Expose the boolean value whether the system is running with KVM in protected mode (nVHE + kernel param). CPU capability was selected over a global variable to allow use in alternatives. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-3-dbrazdil@google.com
|
#
d8b369c4 |
|
02-Dec-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Add kvm-arm.mode early kernel parameter Add an early parameter that allows users to select the mode of operation for KVM/arm64. For now, the only supported value is "protected". By passing this flag users opt into the hypervisor placing additional restrictions on the host kernel. These allow the hypervisor to spawn guests whose state is kept private from the host. Restrictions will include stage-2 address translation to prevent host from accessing guest memory, filtering its SMC calls, etc. Without this parameter, the default behaviour remains selecting VHE/nVHE based on hardware support and CONFIG_ARM64_VHE. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201202184122.26046-2-dbrazdil@google.com
|
#
4f1df628 |
|
26-Nov-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Advertise ID_AA64PFR0_EL1.CSV3=1 if the CPUs are Meltdown-safe Cores that predate the introduction of ID_AA64PFR0_EL1.CSV3 to the ARMv8 architecture have this field set to 0, even of some of them are not affected by the vulnerability. The kernel maintains a list of unaffected cores (A53, A55 and a few others) so that it doesn't impose an expensive mitigation uncessarily. As we do for CSV2, let's expose the CSV3 property to guests that run on HW that is effectively not vulnerable. This can be reset to zero by writing to the ID register from userspace, ensuring that VMs can be migrated despite the new property being set. Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
36fb4cd5 |
|
18-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Remove kvm_arch_vm_ioctl_check_extension() kvm_arch_vm_ioctl_check_extension() is only called from kvm_vm_ioctl_check_extension(), so we can inline it and remove the extra function. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201118194402.2892-3-will@kernel.org
|
#
4f6a36fe |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Remove redundant hyp vectors entry The hyp vectors entry corresponding to HYP_VECTOR_DIRECT (i.e. when neither Spectre-v2 nor Spectre-v3a are present) is unused, as we can simply dispatch straight to __kvm_hyp_vector in this case. Remove the redundant vector, and massage the logic for resolving a slot to a vectors entry. Reported-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201113113847.21619-11-will@kernel.org
|
#
c4792b6d |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
arm64: spectre: Rename ARM64_HARDEN_EL2_VECTORS to ARM64_SPECTRE_V3A Since ARM64_HARDEN_EL2_VECTORS is really a mitigation for Spectre-v3a, rename it accordingly for consistency with the v2 and v4 mitigation. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-9-will@kernel.org
|
#
b881cdce |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Allocate hyp vectors statically The EL2 vectors installed when a guest is running point at one of the following configurations for a given CPU: - Straight at __kvm_hyp_vector - A trampoline containing an SMC sequence to mitigate Spectre-v2 and then a direct branch to __kvm_hyp_vector - A dynamically-allocated trampoline which has an indirect branch to __kvm_hyp_vector - A dynamically-allocated trampoline containing an SMC sequence to mitigate Spectre-v2 and then an indirect branch to __kvm_hyp_vector The indirect branches mean that VA randomization at EL2 isn't trivially bypassable using Spectre-v3a (where the vector base is readable by the guest). Rather than populate these vectors dynamically, configure everything statically and use an enumerated type to identify the vector "slot" corresponding to one of the configurations above. This both simplifies the code, but also makes it much easier to implement at EL2 later on. Signed-off-by: Will Deacon <will@kernel.org> [maz: fixed double call to kvm_init_vector_slots() on nVHE] Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-8-will@kernel.org
|
#
6279017e |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Move BP hardening helpers into spectre.h The BP hardening helpers are an integral part of the Spectre-v2 mitigation, so move them into asm/spectre.h and inline the arm64_get_bp_hardening_data() function at the same time. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-6-will@kernel.org
|
#
07cf8aa9 |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Make BP hardening globals static instead Branch predictor hardening of the hyp vectors is partially driven by a couple of global variables ('__kvm_bp_vect_base' and '__kvm_harden_el2_vector_slot'). However, these are only used within a single compilation unit, so internalise them there instead. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-5-will@kernel.org
|
#
042c76a9 |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Move kvm_get_hyp_vector() out of header file kvm_get_hyp_vector() has only one caller, so move it out of kvm_mmu.h and inline it into a new function, cpu_set_hyp_vector(), for setting the vector. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-4-will@kernel.org
|
#
de5bcdb4 |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Tidy up kvm_map_vector() The bulk of the work in kvm_map_vector() is conditional on the ARM64_HARDEN_EL2_VECTORS capability, so return early if that is not set and make the code a bit easier to read. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-3-will@kernel.org
|
#
8934c845 |
|
13-Nov-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Remove redundant Spectre-v2 code from kvm_map_vector() '__kvm_bp_vect_base' is only used when dealing with the hardened vectors so remove the redundant assignments in kvm_map_vectors(). Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20201113113847.21619-2-will@kernel.org
|
#
23711a5e |
|
10-Nov-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Allow setting of ID_AA64PFR0_EL1.CSV2 from userspace We now expose ID_AA64PFR0_EL1.CSV2=1 to guests running on hosts that are immune to Spectre-v2, but that don't have this field set, most likely because they predate the specification. However, this prevents the migration of guests that have started on a host the doesn't fake this CSV2 setting to one that does, as KVM rejects the write to ID_AA64PFR0_EL2 on the grounds that it isn't what is already there. In order to fix this, allow userspace to set this field as long as this doesn't result in a promising more than what is already there (setting CSV2 to 0 is acceptable, but setting it to 1 when it is already set to 0 isn't). Fixes: e1026237f9067 ("KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2") Reported-by: Peng Liang <liangpeng10@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201110141308.451654-2-maz@kernel.org
|
#
6ac4a5ac |
|
02-Nov-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Drop kvm_coproc.h kvm_coproc.h used to serve as a compatibility layer for the files shared between the 32 and 64 bit ports. Another one bites the dust... Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
22f55384 |
|
27-Oct-2020 |
Qais Yousef <qais.yousef@arm.com> |
KVM: arm64: Handle Asymmetric AArch32 systems On a system without uniform support for AArch32 at EL0, it is possible for the guest to force run AArch32 at EL0 and potentially cause an illegal exception if running on a core without AArch32. Add an extra check so that if we catch the guest doing that, then we prevent it from running again by resetting vcpu->arch.target and return ARM_EXCEPTION_IL. We try to catch this misbehaviour as early as possible and not rely on an illegal exception occuring to signal the problem. Attempting to run a 32bit app in the guest will produce an error from QEMU if the guest exits while running in AArch32 EL0. Tested on Juno by instrumenting the host to fake asym aarch32 and instrumenting KVM to make the asymmetry visible to the guest. [will: Incorporated feedback from Marc] Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201021104611.2744565-2-qais.yousef@arm.com Link: https://lore.kernel.org/r/20201027215118.27003-2-will@kernel.org
|
#
96d389ca |
|
28-Oct-2020 |
Rob Herring <robh@kernel.org> |
arm64: Add workaround for Arm Cortex-A77 erratum 1508412 On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
|
#
a3bb9c3a |
|
22-Sep-2020 |
David Brazdil <dbrazdil@google.com> |
kvm: arm64: Remove unnecessary hyp mappings With all nVHE per-CPU variables being part of the hyp per-CPU region, mapping them individual is not necessary any longer. They are mapped to hyp as part of the overall per-CPU region. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Andrew Scull <ascull@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200922204910.7265-11-dbrazdil@google.com
|
#
30c95391 |
|
22-Sep-2020 |
David Brazdil <dbrazdil@google.com> |
kvm: arm64: Set up hyp percpu data for nVHE Add hyp percpu section to linker script and rename the corresponding ELF sections of hyp/nvhe object files. This moves all nVHE-specific percpu variables to the new hyp percpu section. Allocate sufficient amount of memory for all percpu hyp regions at global KVM init time and create corresponding hyp mappings. The base addresses of hyp percpu regions are kept in a dynamically allocated array in the kernel. Add NULL checks in PMU event-reset code as it may run before KVM memory is initialized. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200922204910.7265-10-dbrazdil@google.com
|
#
2a1198c9 |
|
22-Sep-2020 |
David Brazdil <dbrazdil@google.com> |
kvm: arm64: Create separate instances of kvm_host_data for VHE/nVHE Host CPU context is stored in a global per-cpu variable `kvm_host_data`. In preparation for introducing independent per-CPU region for nVHE hyp, create two separate instances of `kvm_host_data`, one for VHE and one for nVHE. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200922204910.7265-9-dbrazdil@google.com
|
#
df4c8214 |
|
22-Sep-2020 |
David Brazdil <dbrazdil@google.com> |
kvm: arm64: Duplicate arm64_ssbd_callback_required for nVHE hyp Hyp keeps track of which cores require SSBD callback by accessing a kernel-proper global variable. Create an nVHE symbol of the same name and copy the value from kernel proper to nVHE as KVM is being enabled on a core. Done in preparation for separating percpu memory owned by kernel proper and nVHE. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200922204910.7265-8-dbrazdil@google.com
|
#
9ef2b48b |
|
28-Sep-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled Patching the EL2 exception vectors is integral to the Spectre-v2 workaround, where it can be necessary to execute CPU-specific sequences to nobble the branch predictor before running the hypervisor text proper. Remove the dependency on CONFIG_RANDOMIZE_BASE and allow the EL2 vectors to be patched even when KASLR is not enabled. Fixes: 7a132017e7a5 ("KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE") Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/202009221053.Jv1XsQUZ%lkp@intel.com Signed-off-by: Will Deacon <will@kernel.org>
|
#
d63d975a |
|
18-Sep-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() Convert the KVM WA2 code to using the Spectre infrastructure, making the code much more readable. It also allows us to take SSBS into account for the mitigation. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
|
#
29e8910a |
|
17-Sep-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 Owing to the fact that the host kernel is always mitigated, we can drastically simplify the WA2 handling by keeping the mitigation state ON when entering the guest. This means the guest is either unaffected or not mitigated. This results in a nice simplification of the mitigation space, and the removal of a lot of code that was never really used anyway. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
|
#
d7eec236 |
|
12-Feb-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Add PMU event filtering infrastructure It can be desirable to expose a PMU to a guest, and yet not want the guest to be able to count some of the implemented events (because this would give information on shared resources, for example. For this, let's extend the PMUv3 device API, and offer a way to setup a bitmap of the allowed events (the default being no bitmap, and thus no filtering). Userspace can thus allow/deny ranges of event. The default policy depends on the "polarity" of the first filter setup (default deny if the filter allows events, and default allow if the filter denies events). This allows to setup exactly what is allowed for a given guest. Note that although the ioctl is per-vcpu, the map of allowed events is global to the VM (it can be setup from any vcpu until the vcpu PMU is initialized). Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
04e4caa8 |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: nVHE: Migrate hyp-init to SMCCC To complete the transition to SMCCC, the hyp initialization is given a function ID. This looks neater than comparing the hyp stub function IDs to the page table physical address. Some care is taken to only clobber x0-3 before the host context is saved as only those registers can be clobbered accoring to SMCCC. Fortunately, only a few acrobatics are needed. The possible new tpidr_el2 is moved to the argument in x2 so that it can be stashed in tpidr_el2 early to free up a scratch register. The page table configuration then makes use of x0-2. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-19-ascull@google.com
|
#
05469831 |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: nVHE: Migrate hyp interface to SMCCC Rather than passing arbitrary function pointers to run at hyp, define and equivalent set of SMCCC functions. Since the SMCCC functions are strongly tied to the original function prototypes, it is not expected for the host to ever call an invalid ID but a warning is raised if this does ever occur. As __kvm_vcpu_run is used for every switch between the host and a guest, it is explicitly singled out to be identified before the other function IDs to improve the performance of the hot path. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-18-ascull@google.com
|
#
5dc33bd1 |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: nVHE: Pass pointers consistently to hyp-init Rather than some being kernel pointer and others being hyp pointers, standardize on all pointers being hyp pointers. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-15-ascull@google.com
|
#
b619d9aa |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: Introduce hyp context During __guest_enter, save and restore from a new hyp context rather than the host context. This is preparation for separation of the hyp and host context in nVHE. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-9-ascull@google.com
|
#
6e3bfbb2 |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: nVHE: Use separate vector for the host The host is treated differently from the guests when an exception is taken so introduce a separate vector that is specialized for the host. This also allows the nVHE specific code to move out of hyp-entry.S and into nvhe/host.S. The host is only expected to make HVC calls and anything else is considered invalid and results in a panic. Hyp initialization is now passed the vector that is used for the host and it is swapped for the guest vector during the context switch. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-7-ascull@google.com
|
#
a0e47952 |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: Save chosen hyp vector to a percpu variable Introduce a percpu variable to hold the address of the selected hyp vector that will be used with guests. This avoids the selection process each time a guest is being entered and can be used by nVHE when a separate vector is introduced for the host. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-6-ascull@google.com
|
#
d7ca1079 |
|
15-Sep-2020 |
Andrew Scull <ascull@google.com> |
KVM: arm64: Remove kvm_host_data_t typedef The kvm_host_data_t typedef is used inconsistently and goes against the kernel's coding style. Remove it in favour of the full struct specifier. Signed-off-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200915104643.2543892-4-ascull@google.com
|
#
9af3e08b |
|
11-Sep-2020 |
Will Deacon <will@kernel.org> |
KVM: arm64: Remove kvm_mmu_free_memory_caches() kvm_mmu_free_memory_caches() is only called by kvm_arch_vcpu_destroy(), so inline the implementation and get rid of the extra function. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200911132529.19844-2-will@kernel.org
|
#
004a0124 |
|
04-Aug-2020 |
Andrew Jones <drjones@redhat.com> |
arm64/x86: KVM: Introduce steal-time cap arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe support for steal-time. However this is unnecessary, as only a KVM fd is required, and it complicates userspace (userspace may prefer delaying vcpu creation until after feature probing). Introduce a cap that can be checked instead. While x86 can already probe steal-time support with a kvm fd (KVM_GET_SUPPORTED_CPUID), we add the cap there too for consistency. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20200804170604.42662-7-drjones@redhat.com
|
#
abf532cc |
|
03-Aug-2020 |
Rob Herring <robh@kernel.org> |
KVM: arm64: Print warning when cpu erratum can cause guests to deadlock If guests don't have certain CPU erratum workarounds implemented, then there is a possibility a guest can deadlock the system. IOW, only trusted guests should be used on systems with the erratum. This is the case for Cortex-A57 erratum 832075. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200803193127.3012242-2-robh@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
#
74cc7e0c |
|
23-Jun-2020 |
Tianjia Zhang <tianjia.zhang@linux.alibaba.com> |
KVM: arm64: clean up redundant 'kvm_run' parameters In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu' structure. For historical reasons, many kvm-related function parameters retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This patch does a unified cleanup of these remaining redundant parameters. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200623131418.31473-3-tianjia.zhang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e539451b |
|
02-Jul-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: arm64: Use common code's approach for __GFP_ZERO with memory caches Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make the struct and its usage compatible with the common 'struct kvm_mmu_memory_cache' in linux/kvm_host.h. This will minimize code churn when arm64 moves to the common implementation in a future patch, at the cost of temporarily having somewhat silly code. Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO: #define GFP_PGTABLE_USER (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT) | -> #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO) == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO versus #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) with __GFP_ZERO explicitly OR'd in == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO No functional change intended. Tested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703023545.8771-18-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3c5ff0c6 |
|
22-Apr-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_user kvm_timer_sync_hwstate() has nothing to do with the timer HW state, but more to do with the state of a userspace interrupt controller. Change the suffix from _hwstate to_user, in keeping with the rest of the code. Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
a0e50aa3 |
|
04-Jan-2019 |
Christoffer Dall <christoffer.dall@arm.com> |
KVM: arm64: Factor out stage 2 page table data from struct kvm As we are about to reuse our stage 2 page table manipulation code for shadow stage 2 page tables in the context of nested virtualization, we are going to manage multiple stage 2 page tables for a single VM. This requires some pretty invasive changes to our data structures, which moves the vmid and pgd pointers into a separate structure and change pretty much all of our mmu code to operate on this structure instead. The new structure is called struct kvm_s2_mmu. There is no intended functional change by this patch alone. Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> [Designed data structure layout in collaboration] Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Co-developed-by: Marc Zyngier <maz@kernel.org> [maz: Moved the last_vcpu_ran down to the S2 MMU structure as well] Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
95fa0ba8 |
|
01-Jul-2020 |
Peng Hao <richard.peng@oppo.com> |
KVM: arm64: Drop long gone function parameter documentation update_vmid() just has one parameter "vmid".The other parameter "kvm" is no longer used. Signed-off-by: Peng Hao <richard.peng@oppo.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200701120709.388377-1-richard.peng@oppo.com
|
#
13aeb9b4 |
|
25-Jun-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Split hyp/sysreg-sr.c to VHE/nVHE sysreg-sr.c contains KVM's code for saving/restoring system registers, with some code shared between VHE/nVHE. These common routines are moved to a header file, VHE-specific code is moved to vhe/sysreg-sr.c and nVHE-specific code to nvhe/sysreg-sr.c. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200625131420.71444-12-dbrazdil@google.com
|
#
09cf57eb |
|
25-Jun-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Split hyp/switch.c to VHE/nVHE switch.c implements context-switching for KVM, with large parts shared between VHE/nVHE. These common routines are moved to a header file, VHE-specific code is moved to vhe/switch.c and nVHE-specific code is moved to nvhe/switch.c. Previously __kvm_vcpu_run needed a different symbol name for VHE/nVHE. This is cleaned up and the caller in arm.c simplified. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200625131420.71444-10-dbrazdil@google.com
|
#
b877e984 |
|
25-Jun-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Build hyp-entry.S separately for VHE/nVHE hyp-entry.S contains implementation of KVM hyp vectors. This code is mostly shared between VHE/nVHE, therefore compile it under both VHE and nVHE build rules. nVHE-specific host HVC handler is hidden behind __KVM_NVHE_HYPERVISOR__. Adjust code which selects which KVM hyp vecs to install to choose the correct VHE/nVHE symbol. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200625131420.71444-7-dbrazdil@google.com
|
#
07da1ffa |
|
05-Jun-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Remove host_cpu_context member from vcpu structure For very long, we have kept this pointer back to the per-cpu host state, despite having working per-cpu accessors at EL2 for some time now. Recent investigations have shown that this pointer is easy to abuse in preemptible context, which is a sure sign that it would better be gone. Not to mention that a per-cpu pointer is faster to access at all times. Reported-by: Andrew Scull <ascull@google.com> Acked-by: Mark Rutland <mark.rutland@arm.com Reviewed-by: Andrew Scull <ascull@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
29eb5a3c |
|
04-Jun-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Handle PtrAuth traps early The current way we deal with PtrAuth is a bit heavy handed: - We forcefully save the host's keys on each vcpu_load() - Handling the PtrAuth trap forces us to go all the way back to the exit handling code to just set the HCR bits Overall, this is pretty cumbersome. A better approach would be to handle it the same way we deal with the FPSIMD registers: - On vcpu_load() disable PtrAuth for the guest - On first use, save the host's keys, enable PtrAuth in the guest Crucially, this can happen as a fixup, which is done very early on exit. We can then reenter the guest immediately without leaving the hypervisor role. Another thing is that it simplify the rest of the host handling: exiting all the way to the host means that the only possible outcome for this trap is to inject an UNDEF. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
ef3e40a7 |
|
03-Jun-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Save the host's PtrAuth keys in non-preemptible context When using the PtrAuth feature in a guest, we need to save the host's keys before allowing the guest to program them. For that, we dump them in a per-CPU data structure (the so called host context). But both call sites that do this are in preemptible context, which may end up in disaster should the vcpu thread get preempted before reentering the guest. Instead, save the keys eagerly on each vcpu_load(). This has an increased overhead, but is at least safe. Cc: stable@vger.kernel.org Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
d56f5136 |
|
04-Jun-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories After commit 63d0434 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") we are creating the pre-vCPU debugfs files after the creation of the vCPU file descriptor. This makes it possible for userspace to reach kvm_vcpu_release before kvm_create_vcpu_debugfs has finished. The vcpu->debugfs_dentry then does not have any associated inode anymore, and this causes a NULL-pointer dereference in debugfs_create_file. The solution is simply to avoid removing the files; they are cleaned up when the VM file descriptor is closed (and that must be after KVM_CREATE_VCPU returns). We can stop storing the dentry in struct kvm_vcpu too, because it is not needed anywhere after kvm_create_vcpu_debugfs returns. Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com Fixes: 63d04348371b ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7ae2f3db |
|
30-May-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Flush the instruction cache if not unmapping the VM on reboot On a system with FWB, we don't need to unmap Stage-2 on reboot, as even if userspace takes this opportunity to repaint the whole of memory, FWB ensures that the data side stays consistent even if the guest uses non-cacheable mappings. However, the I-side is not necessarily coherent with the D-side if CTR_EL0.DIC is 0. In this case, invalidate the i-cache to preserve coherency. Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Fixes: 892713e97ca1 ("KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported") Signed-off-by: Marc Zyngier <maz@kernel.org>
|
#
71b3ec5f |
|
15-May-2020 |
David Brazdil <dbrazdil@google.com> |
KVM: arm64: Clean up cpu_init_hyp_mode() Pull bits of code to the only place where it is used. Remove empty function __cpu_init_stage2(). Remove redundant has_vhe() check since this function is nVHE-only. No functional changes intended. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200515152056.83158-1-dbrazdil@google.com
|
#
5107000f |
|
27-Apr-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Make KVM_CAP_MAX_VCPUS compatible with the selected GIC version KVM_CAP_MAX_VCPUS always return the maximum possible number of VCPUs, irrespective of the selected interrupt controller. This is pretty misleading for userspace that selects a GICv2 on a GICv3 system that supports v2 compat: It always gets a maximum of 512 VCPUs, even if the effective limit is 8. The 9th VCPU will fail to be created, which is unexpected as far as userspace is concerned. Fortunately, we already have the right information stashed in the kvm structure, and we can return it as requested. Reported-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20200427141507.284985-1-maz@kernel.org
|
#
892713e9 |
|
15-Apr-2020 |
Zenghui Yu <yuzenghui@huawei.com> |
KVM: arm64: Sidestep stage2_unmap_vm() on vcpu reset when S2FWB is supported stage2_unmap_vm() was introduced to unmap user RAM region in the stage2 page table to make the caches coherent. E.g., a guest reboot with stage1 MMU disabled will access memory using non-cacheable attributes. If the RAM and caches are not coherent at this stage, some evicted dirty cache line may go and corrupt guest data in RAM. Since ARMv8.4, S2FWB feature is mandatory and KVM will take advantage of it to configure the stage2 page table and the attributes of memory access. So we ensure that guests always access memory using cacheable attributes and thus, the caches always be coherent. So on CPUs that support S2FWB, we can safely reset the vcpu without a heavy stage2 unmapping. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200415072835.1164-1-yuzenghui@huawei.com
|
#
656012c7 |
|
01-Apr-2020 |
Fuad Tabba <tabba@google.com> |
KVM: Fix spelling in code comments Fix spelling and typos (e.g., repeated words) in comments. Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com
|
#
9ed24f4b |
|
13-May-2020 |
Marc Zyngier <maz@kernel.org> |
KVM: arm64: Move virt/kvm/arm to arch/arm64 Now that the 32bit KVM/arm host is a distant memory, let's move the whole of the KVM/arm64 code into the arm64 tree. As they said in the song: Welcome Home (Sanitarium). Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200513104034.74741-1-maz@kernel.org
|