Cross Reference: /linux-master/arch/arm64/kvm/pkvm.c

History log of /linux-master/arch/arm64/kvm/pkvm.c
Revision	Date	Author	Comments
# 10c02aad	24-Jan-2024	Sebastian Ene <sebastianene@google.com>	KVM: arm64: Fix circular locking dependency The rule inside kvm enforces that the vcpu->mutex is taken inside kvm->lock. The rule is violated by the pkvm_create_hyp_vm() which acquires the kvm->lock while already holding the vcpu->mutex lock from kvm_vcpu_ioctl(). Avoid the circular locking dependency altogether by protecting the hyp vm handle with the config_lock, much like we already do for other forms of VM-scoped data. Signed-off-by: Sebastian Ene <sebastianene@google.com> Cc: stable@vger.kernel.org Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20240124091027.1477174-2-sebastianene@google.com
# fe49fd94	12-Oct-2023	Marc Zyngier <maz@kernel.org>	KVM: arm64: Move VTCR_EL2 into struct s2_mmu We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the output address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
# fa729bc7	04-Jul-2023	Sudeep Holla <sudeep.holla@arm.com>	KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm Currently there is no synchronisation between finalize_pkvm() and kvm_arm_init() initcalls. The finalize_pkvm() proceeds happily even if kvm_arm_init() fails resulting in the following warning on all the CPUs and eventually a HYP panic: \| kvm [1]: IPA Size Limit: 48 bits \| kvm [1]: Failed to init hyp memory protection \| kvm [1]: error initializing Hyp mode: -22 \| \| <snip> \| \| WARNING: CPU: 0 PID: 0 at arch/arm64/kvm/pkvm.c:226 _kvm_host_prot_finalize+0x30/0x50 \| Modules linked in: \| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0 #237 \| Hardware name: FVP Base RevC (DT) \| pstate: 634020c5 (nZCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--) \| pc : _kvm_host_prot_finalize+0x30/0x50 \| lr : __flush_smp_call_function_queue+0xd8/0x230 \| \| Call trace: \| _kvm_host_prot_finalize+0x3c/0x50 \| on_each_cpu_cond_mask+0x3c/0x6c \| pkvm_drop_host_privileges+0x4c/0x78 \| finalize_pkvm+0x3c/0x5c \| do_one_initcall+0xcc/0x240 \| do_initcall_level+0x8c/0xac \| do_initcalls+0x54/0x94 \| do_basic_setup+0x1c/0x28 \| kernel_init_freeable+0x100/0x16c \| kernel_init+0x20/0x1a0 \| ret_from_fork+0x10/0x20 \| Failed to finalize Hyp protection: -22 \| dtb=fvp-base-revc.dtb \| kvm [95]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/mem_protect.c:540! \| kvm [95]: nVHE call trace: \| kvm [95]: [<ffff800081052984>] __kvm_nvhe_hyp_panic+0xac/0xf8 \| kvm [95]: [<ffff800081059644>] __kvm_nvhe_handle_host_mem_abort+0x1a0/0x2ac \| kvm [95]: [<ffff80008105511c>] __kvm_nvhe_handle_trap+0x4c/0x160 \| kvm [95]: [<ffff8000810540fc>] __kvm_nvhe___skip_pauth_save+0x4/0x4 \| kvm [95]: ---[ end nVHE call trace ]--- \| kvm [95]: Hyp Offset: 0xfffe8db00ffa0000 \| Kernel panic - not syncing: HYP panic: \| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 \| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 \| VCPU:0000000000000000 \| CPU: 3 PID: 95 Comm: kworker/u16:2 Tainted: G W 6.4.0 #237 \| Hardware name: FVP Base RevC (DT) \| Workqueue: rpciod rpc_async_schedule \| Call trace: \| dump_backtrace+0xec/0x108 \| show_stack+0x18/0x2c \| dump_stack_lvl+0x50/0x68 \| dump_stack+0x18/0x24 \| panic+0x138/0x33c \| nvhe_hyp_panic_handler+0x100/0x184 \| new_slab+0x23c/0x54c \| ___slab_alloc+0x3e4/0x770 \| kmem_cache_alloc_node+0x1f0/0x278 \| __alloc_skb+0xdc/0x294 \| tcp_stream_alloc_skb+0x2c/0xf0 \| tcp_sendmsg_locked+0x3d0/0xda4 \| tcp_sendmsg+0x38/0x5c \| inet_sendmsg+0x44/0x60 \| sock_sendmsg+0x1c/0x34 \| xprt_sock_sendmsg+0xdc/0x274 \| xs_tcp_send_request+0x1ac/0x28c \| xprt_transmit+0xcc/0x300 \| call_transmit+0x78/0x90 \| __rpc_execute+0x114/0x3d8 \| rpc_async_schedule+0x28/0x48 \| process_one_work+0x1d8/0x314 \| worker_thread+0x248/0x474 \| kthread+0xfc/0x184 \| ret_from_fork+0x10/0x20 \| SMP: stopping secondary CPUs \| Kernel Offset: 0x57c5cb460000 from 0xffff800080000000 \| PHYS_OFFSET: 0x80000000 \| CPU features: 0x00000000,1035b7a3,ccfe773f \| Memory Limit: none \| ---[ end Kernel panic - not syncing: HYP panic: \| PS:a34023c9 PC:0000f250710b973c ESR:00000000f2000800 \| FAR:ffff000800cb00d0 HPFAR:000000000880cb00 PAR:0000000000000000 \| VCPU:0000000000000000 ]--- Fix it by checking for the successfull initialisation of kvm_arm_init() in finalize_pkvm() before proceeding any futher. Fixes: 87727ba2bb05 ("KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege") Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230704193243.3300506-1-sudeep.holla@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
# bc3888a0	23-May-2023	Will Deacon <will@kernel.org>	KVM: arm64: Allocate pages for hypervisor FF-A mailboxes The FF-A proxy code needs to allocate its own buffer pair for communication with EL3 and for forwarding calls from the host at EL1. Reserve a couple of pages for this purpose and use them to initialise the hypervisor's FF-A buffer structure. Co-developed-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Andrew Walbran <qwandor@google.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230523101828.7328-4-will@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
# 87727ba2	20-Apr-2023	Will Deacon <will@kernel.org>	KVM: arm64: Ensure CPU PMU probes before pKVM host de-privilege Although pKVM supports CPU PMU emulation for non-protected guests since 722625c6f4c5 ("KVM: arm64: Reenable pmu in Protected Mode"), this relies on the PMU driver probing before the host has de-privileged so that the 'kvm_arm_pmu_available' static key can still be enabled by patching the hypervisor text. As it happens, both of these events hang off device_initcall() but the PMU consistently won the race until 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf"). Since then, the host will fail to boot when pKVM is enabled: \| hw perfevents: enabled with armv8_pmuv3_0 PMU driver, 7 counters available \| kvm [1]: nVHE hyp BUG at: [<ffff8000090366e0>] __kvm_nvhe_handle_host_mem_abort+0x270/0x284! \| kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE \| kvm [1]: Hyp Offset: 0xfffea41fbdf70000 \| Kernel panic - not syncing: HYP panic: \| PS:a00003c9 PC:0000dbe04b0c66e0 ESR:00000000f2000800 \| FAR:fffffbfffddfcf00 HPFAR:00000000010b0bf0 PAR:0000000000000000 \| VCPU:0000000000000000 \| CPU: 2 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc7-00083-g0bce6746d154 #1 \| Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 \| Call trace: \| dump_backtrace+0xec/0x108 \| show_stack+0x18/0x2c \| dump_stack_lvl+0x50/0x68 \| dump_stack+0x18/0x24 \| panic+0x13c/0x33c \| nvhe_hyp_panic_handler+0x10c/0x190 \| aarch64_insn_patch_text_nosync+0x64/0xc8 \| arch_jump_label_transform+0x4c/0x5c \| __jump_label_update+0x84/0xfc \| jump_label_update+0x100/0x134 \| static_key_enable_cpuslocked+0x68/0xac \| static_key_enable+0x20/0x34 \| kvm_host_pmu_init+0x88/0xa4 \| armpmu_register+0xf0/0xf4 \| arm_pmu_acpi_probe+0x2ec/0x368 \| armv8_pmu_driver_init+0x38/0x44 \| do_one_initcall+0xcc/0x240 Fix the race properly by deferring the de-privilege step to device_initcall_sync(). This will also be needed in future when probing IOMMU devices and allows us to separate the pKVM de-privilege logic from the core hypervisor initialisation path. Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Fixes: 7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf") Tested-by: Fuad Tabba <tabba@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230420123356.2708-1-will@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
# f41dff4e	10-Nov-2022	Quentin Perret <qperret@google.com>	KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache Rather than relying on the host to free the previously-donated pKVM hypervisor VM pages explicitly on teardown, introduce a dedicated teardown memcache which allows the host to reclaim guest memory resources without having to keep track of all of the allocations made by the pKVM hypervisor at EL2. Tested-by: Vincent Donnefort <vdonnefort@google.com> Co-developed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> [maz: dropped __maybe_unused from unmap_donated_memory_noclear()] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-21-will@kernel.org
# 9d0c063a	10-Nov-2022	Fuad Tabba <tabba@google.com>	KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 With the pKVM hypervisor at EL2 now offering hypercalls to the host for creating and destroying VM and vCPU structures, plumb these in to the existing arm64 KVM backend to ensure that the hypervisor data structures are allocated and initialised on first vCPU run for a pKVM guest. In the host, 'struct kvm_protected_vm' is introduced to hold the handle of the pKVM VM instance as well as to track references to the memory donated to the hypervisor so that it can be freed back to the host allocator following VM teardown. The stage-2 page-table, hypervisor VM and vCPU structures are allocated separately so as to avoid the need for a large physically-contiguous allocation in the host at run-time. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-14-will@kernel.org
# a1ec5c70	10-Nov-2022	Fuad Tabba <tabba@google.com>	KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Introduce a global table (and lock) to track pKVM instances at EL2, and provide hypercalls that can be used by the untrusted host to create and destroy pKVM VMs and their vCPUs. pKVM VM/vCPU state is directly accessible only by the trusted hypervisor (EL2). Each pKVM VM is directly associated with an untrusted host KVM instance, and is referenced by the host using an opaque handle. Future patches will provide hypercalls to allow the host to initialize/set/get pKVM VM/vCPU state using the opaque handle. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Co-developed-by: Will Deacon <will@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> [maz: silence warning on unmap_donated_memory_noclear()] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-13-will@kernel.org
# 8e6bcc3a	10-Nov-2022	Quentin Perret <qperret@google.com>	KVM: arm64: Back the hypervisor 'struct hyp_page' array for all memory The EL2 'vmemmap' array in nVHE Protected mode is currently very sparse: only memory pages owned by the hypervisor itself have a matching 'struct hyp_page'. However, as the size of this struct has been reduced significantly since its introduction, it appears that we can now afford to back the vmemmap for all of memory. Having an easily accessible 'struct hyp_page' for every physical page in memory provides the hypervisor with a simple mechanism to store metadata (e.g. a refcount) that wouldn't otherwise fit in the very limited number of software bits available in the host stage-2 page-table entries. This will be used in subsequent patches when pinning host memory pages for use by the hypervisor at EL2. Tested-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221110190259.26861-4-will@kernel.org
# 9429f4b0	02-Dec-2021	Will Deacon <will@kernel.org>	KVM: arm64: Move host EL1 code out of hyp/ directory kvm/hyp/reserved_mem.c contains host code executing at EL1 and is not linked into the hypervisor object. Move the file into kvm/pkvm.c and rework the headers so that the definitions shared between the host and the hypervisor live in asm/kvm_pkvm.h. Signed-off-by: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211202171048.26924-4-will@kernel.org