#
75253db4 |
|
25-Jan-2024 |
Brijesh Singh <brijesh.singh@amd.com> |
KVM: SEV: Make AVIC backing, VMSA and VMCB memory allocation SNP safe Implement a workaround for an SNP erratum where the CPU will incorrectly signal an RMP violation #PF if a hugepage (2MB or 1GB) collides with the RMP entry of a VMCB, VMSA or AVIC backing page. When SEV-SNP is globally enabled, the CPU marks the VMCB, VMSA, and AVIC backing pages as "in-use" via a reserved bit in the corresponding RMP entry after a successful VMRUN. This is done for _all_ VMs, not just SNP-Active VMs. If the hypervisor accesses an in-use page through a writable translation, the CPU will throw an RMP violation #PF. On early SNP hardware, if an in-use page is 2MB-aligned and software accesses any part of the associated 2MB region with a hugepage, the CPU will incorrectly treat the entire 2MB region as in-use and signal a an RMP violation #PF. To avoid this, the recommendation is to not use a 2MB-aligned page for the VMCB, VMSA or AVIC pages. Add a generic allocator that will ensure that the page returned is not 2MB-aligned and is safe to be used when SEV-SNP is enabled. Also implement similar handling for the VMCB/VMSA pages of nested guests. [ mdr: Squash in nested guest handling from Ashish, commit msg fixups. ] Reported-by: Alper Gun <alpergun@google.com> # for nested VMSA case Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Co-developed-by: Marc Orr <marcorr@google.com> Signed-off-by: Marc Orr <marcorr@google.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20240126041126.1927228-22-michael.roth@amd.com
|
#
017a99a9 |
|
05-Dec-2023 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Hide more stuff under CONFIG_KVM_HYPERV/CONFIG_HYPERV 'struct hv_vmcb_enlightenments' in VMCB only make sense when either CONFIG_KVM_HYPERV or CONFIG_HYPERV is enabled. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231205103630.1391318-17-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
af9d544a |
|
05-Dec-2023 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Introduce helper to handle Hyper-V paravirt TLB flush requests As a preparation to making Hyper-V emulation optional, introduce a helper to handle pending KVM_REQ_HV_TLB_FLUSH requests. No functional change intended. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20231205103630.1391318-8-vkuznets@redhat.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
a484755a |
|
18-Oct-2023 |
Sean Christopherson <seanjc@google.com> |
Revert "nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB" Revert KVM's made-up consistency check on SVM's TLB control. The APM says that unsupported encodings are reserved, but the APM doesn't state that VMRUN checks for a supported encoding. Unless something is called out in "Canonicalization and Consistency Checks" or listed as MBZ (Must Be Zero), AMD behavior is typically to let software shoot itself in the foot. This reverts commit 174a921b6975ef959dd82ee9e8844067a62e3ec1. Fixes: 174a921b6975 ("nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB") Reported-by: Stefan Sterz <s.sterz@proxmox.com> Closes: https://lkml.kernel.org/r/b9915c9c-4cf6-051a-2d91-44cc6380f455%40proxmox.com Cc: stable@vger.kernel.org Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20231018194104.1896415-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
2c49db45 |
|
13-Sep-2023 |
Binbin Wu <binbin.wu@linux.intel.com> |
KVM: x86: Add & use kvm_vcpu_is_legal_cr3() to check CR3's legality Add and use kvm_vcpu_is_legal_cr3() to check CR3's legality to provide a clear distinction between CR3 and GPA checks. This will allow exempting bits from kvm_vcpu_is_legal_cr3() without affecting general GPA checks, e.g. for upcoming features that will use high bits in CR3 for feature enabling. No functional change intended. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-7-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
3fdc6087 |
|
28-Sep-2023 |
Maxim Levitsky <mlevitsk@redhat.com> |
x86: KVM: SVM: refresh AVIC inhibition in svm_leave_nested() svm_leave_nested() similar to a nested VM exit, get the vCPU out of nested mode and thus should end the local inhibition of AVIC on this vCPU. Failure to do so, can lead to hangs on guest reboot. Raise the KVM_REQ_APICV_UPDATE request to refresh the AVIC state of the current vCPU in this case. Fixes: f44509f849fe ("KVM: x86: SVM: allow AVIC to co-exist with a nested guest running") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230928173354.217464-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b89456ae |
|
15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use KVM-governed feature framework to track "vGIF enabled" Track "virtual GIF exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "vgif" param means that the code isn't strictly equivalent, as vgif_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are Link: https://lore.kernel.org/r/20230815203653.519297-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
59d67fc1 |
|
15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use KVM-governed feature framework to track "Pause Filter enabled" Track "Pause Filtering is exposed to L1" via governed feature flags instead of using dedicated bits/flags in vcpu_svm. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-13-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
e183d17a |
|
15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use KVM-governed feature framework to track "LBRv enabled" Track "LBR virtualization exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, checking KVM's capabilities instead of the "lbrv" param means that the code isn't strictly equivalent, as lbrv_enabled could have been set if nested=false where as that the governed feature cannot. But that's a glorified nop as the feature/flag is consumed only by paths that are gated by nSVM being enabled. Link: https://lore.kernel.org/r/20230815203653.519297-12-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
4d2a1560 |
|
15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use KVM-governed feature framework to track "vVM{SAVE,LOAD} enabled" Track "virtual VMSAVE/VMLOAD exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Opportunistically add a comment explaining why KVM disallows virtual VMLOAD/VMSAVE when the vCPU model is Intel. No functional change intended. Link: https://lore.kernel.org/r/20230815203653.519297-11-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
4365a455 |
|
15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use KVM-governed feature framework to track "TSC scaling enabled" Track "TSC scaling exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. Note, this fixes a benign bug where KVM would mark TSC scaling as exposed to L1 even if overall nested SVM supported is disabled, i.e. KVM would let L1 write MSR_AMD64_TSC_RATIO even when KVM didn't advertise TSCRATEMSR support to userspace. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
7a6a6a3b |
|
15-Aug-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use KVM-governed feature framework to track "NRIPS enabled" Track "NRIPS exposed to L1" via a governed feature flag instead of using a dedicated bit/flag in vcpu_svm. No functional change intended. Reviewed-by: Yuan Yao <yuan.yao@intel.com> Link: https://lore.kernel.org/r/20230815203653.519297-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
2d636990 |
|
28-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Always write vCPU's current TSC offset/ratio in vendor hooks Drop the @offset and @multiplier params from the kvm_x86_ops hooks for propagating TSC offsets/multipliers into hardware, and instead have the vendor implementations pull the information directly from the vCPU structure. The respective vCPU fields _must_ be written at the same time in order to maintain consistent state, i.e. it's not random luck that the value passed in by all callers is grabbed from the vCPU. Explicitly grabbing the value from the vCPU field in SVM's implementation in particular will allow for additional cleanup without introducing even more subtle dependencies. Specifically, SVM can skip the WRMSR if guest state isn't loaded, i.e. svm_prepare_switch_to_guest() will load the correct value for the vCPU prior to entering the guest. This also reconciles KVM's handling of related values that are stored in the vCPU, as svm_write_tsc_offset() already assumes/requires the caller to have updated l1_tsc_offset. Link: https://lore.kernel.org/r/20230729011608.1065019-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
c0dc39bd |
|
28-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use the "outer" helper for writing multiplier to MSR_AMD64_TSC_RATIO When emulating nested SVM transitions, use the outer helper for writing the TSC multiplier for L2. Using the inner helper only for one-off cases, i.e. for paths where KVM is NOT emulating or modifying vCPU state, will allow for multiple cleanups: - Explicitly disabling preemption only in the outer helper - Getting the multiplier from the vCPU field in the outer helper - Skipping the WRMSR in the outer helper if guest state isn't loaded Opportunistically delete an extra newline. No functional change intended. Link: https://lore.kernel.org/r/20230729011608.1065019-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
0c94e246 |
|
28-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Load L1's TSC multiplier based on L1 state, not L2 state When emulating nested VM-Exit, load L1's TSC multiplier if L1's desired ratio doesn't match the current ratio, not if the ratio L1 is using for L2 diverges from the default. Functionally, the end result is the same as KVM will run L2 with L1's multiplier if L2's multiplier is the default, i.e. checking that L1's multiplier is loaded is equivalent to checking if L2 has a non-default multiplier. However, the assertion that TSC scaling is exposed to L1 is flawed, as userspace can trigger the WARN at will by writing the MSR and then updating guest CPUID to hide the feature (modifying guest CPUID is allowed anytime before KVM_RUN). E.g. hacking KVM's state_test selftest to do vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0); vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR); after restoring state in a new VM+vCPU yields an endless supply of: ------------[ cut here ]------------ WARNING: CPU: 10 PID: 206939 at arch/x86/kvm/svm/nested.c:1105 nested_svm_vmexit+0x6af/0x720 [kvm_amd] Call Trace: nested_svm_exit_handled+0x102/0x1f0 [kvm_amd] svm_handle_exit+0xb9/0x180 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm] kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm] ? trace_hardirqs_off+0x4d/0xa0 __se_sys_ioctl+0x7a/0xc0 __x64_sys_ioctl+0x21/0x30 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Unlike the nested VMRUN path, hoisting the svm->tsc_scaling_enabled check into the if-statement is wrong as KVM needs to ensure L1's multiplier is loaded in the above scenario. Alternatively, the WARN_ON() could simply be deleted, but that would make KVM's behavior even more subtle, e.g. it's not immediately obvious why it's safe to write MSR_AMD64_TSC_RATIO when checking only tsc_ratio_msr. Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230729011608.1065019-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
7cafe9b8 |
|
28-Jul-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Check instead of asserting on nested TSC scaling support Check for nested TSC scaling support on nested SVM VMRUN instead of asserting that TSC scaling is exposed to L1 if L1's MSR_AMD64_TSC_RATIO has diverged from KVM's default. Userspace can trigger the WARN at will by writing the MSR and then updating guest CPUID to hide the feature (modifying guest CPUID is allowed anytime before KVM_RUN). E.g. hacking KVM's state_test selftest to do vcpu_set_msr(vcpu, MSR_AMD64_TSC_RATIO, 0); vcpu_clear_cpuid_feature(vcpu, X86_FEATURE_TSCRATEMSR); after restoring state in a new VM+vCPU yields an endless supply of: ------------[ cut here ]------------ WARNING: CPU: 164 PID: 62565 at arch/x86/kvm/svm/nested.c:699 nested_vmcb02_prepare_control+0x3d6/0x3f0 [kvm_amd] Call Trace: <TASK> enter_svm_guest_mode+0x114/0x560 [kvm_amd] nested_svm_vmrun+0x260/0x330 [kvm_amd] vmrun_interception+0x29/0x30 [kvm_amd] svm_invoke_exit_handler+0x35/0x100 [kvm_amd] svm_handle_exit+0xe7/0x180 [kvm_amd] kvm_arch_vcpu_ioctl_run+0x1eab/0x2570 [kvm] kvm_vcpu_ioctl+0x4c9/0x5b0 [kvm] __se_sys_ioctl+0x7a/0xc0 __x64_sys_ioctl+0x21/0x30 do_syscall_64+0x41/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x45ca1b Note, the nested #VMEXIT path has the same flaw, but needs a different fix and will be handled separately. Fixes: 5228eb96a487 ("KVM: x86: nSVM: implement nested TSC scaling") Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230729011608.1065019-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
0977cfac |
|
27-Feb-2023 |
Santosh Shukla <santosh.shukla@amd.com> |
KVM: nSVM: Implement support for nested VNMI Allow L1 to use vNMI to accelerate its injection of NMI to L2 by propagating vNMI int_ctl bits from/to vmcb12 to/from vmcb02. To handle both the case where vNMI is enabled for L1 and L2, and where vNMI is enabled for L1 but _not_ L2, move pending L1 vNMIs to nmi_pending on nested VM-Entry and raise KVM_REQ_EVENT, i.e. rely on existing code to route the NMI to the correct domain. On nested VM-Exit, reverse the process and set/clear V_NMI_PENDING for L1 based one whether nmi_pending is zero or non-zero. There is no need to consider vmcb02 in this case, as V_NMI_PENDING can be set in vmcb02 if vNMI is disabled for L2, and if vNMI is enabled for L2, then L1 and L2 have different NMI contexts. Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-12-santosh.shukla@amd.com [sean: massage changelog to match the code] Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
5d1ec456 |
|
27-Feb-2023 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: Raise event on nested VM exit if L1 doesn't intercept IRQs If L1 doesn't intercept interrupts, then KVM will use vmcb02's V_IRQ to detect an interrupt window for L1 IRQs. On a subsequent nested VM-Exit, KVM might need to copy the current V_IRQ from vmcb02 to vmcb01 to continue waiting for an interrupt window, i.e. if there is still a pending IRQ for L1. Raise KVM_REQ_EVENT on nested exit if L1 isn't intercepting IRQs to ensure that KVM will re-enable interrupt window detection if needed. Note that this is a theoretical bug because KVM already raises KVM_REQ_EVENT on each nested VM exit, because the nested VM exit resets RFLAGS and kvm_set_rflags() raises the KVM_REQ_EVENT unconditionally. Explicitly raise KVM_REQ_EVENT for the interrupt window case to avoid having an unnecessary dependency on kvm_set_rflags(), and to document the scenario. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [santosh: reworded description as per Sean's v2 comment] Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-4-santosh.shukla@amd.com [sean: further massage changelog and comment] Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
7334ede4 |
|
27-Feb-2023 |
Santosh Shukla <Santosh.Shukla@amd.com> |
KVM: nSVM: Disable intercept of VINTR if saved L1 host RFLAGS.IF is 0 Disable intercept of virtual interrupts (used to detect interrupt windows) if the saved host (L1) RFLAGS.IF is '0', as the effective RFLAGS.IF for L1 interrupts will never be set while L2 is running (L2's RFLAGS.IF doesn't affect L1 IRQs when virtual interrupts are enabled). Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lkml.kernel.org/r/Y9hybI65So5X2LFg%40google.com Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-3-santosh.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
5faaffab |
|
27-Feb-2023 |
Santosh Shukla <Santosh.Shukla@amd.com> |
KVM: nSVM: Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting VINTR Don't sync vmcb02 V_IRQ back to vmcb12 if KVM (L0) is intercepting virtual interrupts in order to request an interrupt window, as KVM has usurped vmcb02's int_ctl. If an interrupt window opens before the next VM-Exit, svm_clear_vintr() will restore vmcb12's int_ctl. If no window opens, V_IRQ will be correctly preserved in vmcb12's int_ctl (because it was never recognized while L2 was running). Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lkml.kernel.org/r/Y9hybI65So5X2LFg%40google.com Signed-off-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20230227084016.3368-2-santosh.shukla@amd.com Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
8957cbcf |
|
29-Nov-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: Don't sync tlb_ctl back to vmcb12 on nested VM-Exit Don't sync the TLB control field from vmcb02 to vmcs12 on nested VM-Exit. Per AMD's APM, the field is not modified by hardware: The VMRUN instruction reads, but does not change, the value of the TLB_CONTROL field Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20221129193717.513824-2-mlevitsk@redhat.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
|
#
2008fab3 |
|
05-Jan-2023 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled Free the APIC access page memslot if any vCPU enables x2APIC and SVM's AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with x2APIC enabled. On AMD, if its "hybrid" mode is enabled (AVIC is enabled when x2APIC is enabled even without x2AVIC support), keeping the APIC access page memslot results in the guest being able to access the virtual APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware that the guest is operating in x2APIC mode. Exempt nested SVM's update of APICv state from the new logic as x2APIC can't be toggled on VM-Exit. In practice, invoking the x2APIC logic should be harmless precisely because it should be a glorified nop, but play it safe to avoid latent bugs, e.g. with dropping the vCPU's SRCU lock. Intel doesn't suffer from the same issue as APICv has fully independent VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM should provide bus error semantics and not memory semantics for the APIC page when x2APIC is enabled, but KVM already provides memory semantics in other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware disabled (via APIC_BASE MSR). Note, checking apic_access_memslot_enabled without taking locks relies it being set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to set the inhibit and delete the memslot, i.e. can get false positives, but can't get false negatives as apic_access_memslot_enabled can't be toggled "on" once any vCPU reaches KVM_RUN. Opportunistically drop the "can" while updating avic_activate_vmcb()'s comment, i.e. to state that KVM _does_ support the hybrid mode. Move the "Note:" down a line to conform to preferred kernel/KVM multi-line comment style. Opportunistically update the apicv_update_lock comment, as it isn't actually used to protect apic_access_memslot_enabled (which is protected by slots_lock). Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode") Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230106011306.85230-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8d20bd63 |
|
30-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Unify pr_fmt to use module name for all KVM modules Define pr_fmt using KBUILD_MODNAME for all KVM x86 code so that printks use consistent formatting across common x86, Intel, and AMD code. In addition to providing consistent print formatting, using KBUILD_MODNAME, e.g. kvm_amd and kvm_intel, allows referencing SVM and VMX (and SEV and SGX and ...) as technologies without generating weird messages, and without causing naming conflicts with other kernel code, e.g. "SEV: ", "tdx: ", "sgx: " etc.. are all used by the kernel for non-KVM subsystems. Opportunistically move away from printk() for prints that need to be modified anyways, e.g. to drop a manual "kvm: " prefix. Opportunistically convert a few SGX WARNs that are similarly modified to WARN_ONCE; in the very unlikely event that the WARNs fire, odds are good that they would fire repeatedly and spam the kernel log without providing unique information in each print. Note, defining pr_fmt yields undesirable results for code that uses KVM's printk wrappers, e.g. vcpu_unimpl(). But, that's a pre-existing problem as SVM/kvm_amd already defines a pr_fmt, and thankfully use of KVM's wrappers is relatively limited in KVM x86 code. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20221130230934.1014142-35-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
74905e3d |
|
09-Nov-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: clarify recalc_intercepts() wrt CR8 The mysterious comment "We only want the cr8 intercept bits of L1" dates back to basically the introduction of nested SVM, back when the handling of "less typical" hypervisors was very haphazard. With the development of kvm-unit-tests for interrupt handling, the same code grew another vmcb_clr_intercept for the interrupt window (VINTR) vmexit, this time with a comment that is at least decent. It turns out however that the same comment applies to the CR8 write intercept, which is also a "recheck if an interrupt should be injected" intercept. The CR8 read intercept instead has not been used by KVM for 14 years (commit 649d68643ebf, "KVM: SVM: sync TPR value to V_TPR field in the VMCB"), so do not bother clearing it and let one comment describe both CR8 write and VINTR handling. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3f4a812e |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: hyper-v: Enable L2 TLB flush Implement Hyper-V L2 TLB flush for nSVM. The feature needs to be enabled both in extended 'nested controls' in VMCB and VP assist page. According to Hyper-V TLFS, synthetic vmexit to L1 is performed with - HV_SVM_EXITCODE_ENL exit_code. - HV_SVM_ENL_EXITCODE_TRAP_AFTER_FLUSH exit_info_1. Note: VP assist page is cached in 'struct kvm_vcpu_hv' so recalc_intercepts() doesn't need to read from guest's memory. KVM needs to update the case upon each VMRUN and after svm_set_nested_state (svm_get_nested_state_pages()) to handle the case when the guest got migrated while L2 was running. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-29-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b0c9c25e |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: Introduce .hv_inject_synthetic_vmexit_post_tlb_flush() nested hook Hyper-V supports injecting synthetic L2->L1 exit after performing L2 TLB flush operation but the procedure is vendor specific. Introduce .hv_inject_synthetic_vmexit_post_tlb_flush nested hook for it. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-22-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e45aa244 |
|
01-Nov-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Keep track of Hyper-V hv_vm_id/hv_vp_id Similar to nSVM, KVM needs to know L2's VM_ID/VP_ID and Partition assist page address to handle L2 TLB flush requests. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-21-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
26b516bb |
|
01-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
x86/hyperv: KVM: Rename "hv_enlightenments" to "hv_vmcb_enlightenments" Now that KVM isn't littered with "struct hv_enlightenments" casts, rename the struct to "hv_vmcb_enlightenments" to highlight the fact that the struct is specifically for SVM's VMCB. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
68ae7c7b |
|
01-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: SVM: Add a proper field for Hyper-V VMCB enlightenments Add a union to provide hv_enlightenments side-by-side with the sw_reserved bytes that Hyper-V's enlightenments overlay. Casting sw_reserved everywhere is messy, confusing, and unnecessarily unsafe. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
089fe572 |
|
01-Nov-2022 |
Sean Christopherson <seanjc@google.com> |
x86/hyperv: Move VMCB enlightenment definitions to hyperv-tlfs.h Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the definitions come directly from the TLFS[*], not from KVM. No functional change intended. [*] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields [vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of hyperv-tlfs.h, keep svm/hyperv.h] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20221101145426.251680-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
31e83e21 |
|
29-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: compile out vendor-specific code if SMM is disabled Vendor-specific code that deals with SMI injection and saving/restoring SMM state is not needed if CONFIG_KVM_SMM is disabled, so remove the four callbacks smi_allowed, enter_smm, leave_smm and enable_smi_window. The users in svm/nested.c and x86.c also have to be compiled out; the amount of #ifdef'ed code is small and it's not worth moving it to smm.c. enter_smm is now used only within #ifdef CONFIG_KVM_SMM, and the stub can therefore be removed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-7-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b0b42197 |
|
29-Sep-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: start moving SMM-related functions to new files Create a new header and source with code related to system management mode emulation. Entry and exit will move there too; for now, opportunistically rename put_smstate to PUT_SMSTATE while moving it to smm.h, and adjust the SMM state saving code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
92e7d5c8 |
|
03-Nov-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: allow L1 to not intercept triple fault This is SVM correctness fix - although a sane L1 would intercept SHUTDOWN event, it doesn't have to, so we have to honour this. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-8-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f9697df2 |
|
03-Nov-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: add kvm_leave_nested add kvm_leave_nested which wraps a call to nested_ops->leave_nested into a function. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
16ae56d7 |
|
03-Nov-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use Make sure that KVM uses vmcb01 before freeing nested state, and warn if that is not the case. This is a minimal fix for CVE-2022-3344 making the kernel print a warning instead of a kernel panic. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20221103141351.50662-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e746c1f1 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Rename inject_pending_events() to kvm_check_and_inject_events() Rename inject_pending_events() to kvm_check_and_inject_events() in order to capture the fact that it handles more than just pending events, and to (mostly) align with kvm_check_nested_events(), which omits the "inject" for brevity. Add a comment above kvm_check_and_inject_events() to provide a high-level synopsis, and to document a virtualization hole (KVM erratum if you will) that exists due to KVM not strictly tracking instruction boundaries with respect to coincident instruction restarts and asynchronous events. No functional change inteded. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-25-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7709aba8 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Morph pending exceptions to pending VM-Exits at queue time Morph pending exceptions to pending VM-Exits (due to interception) when the exception is queued instead of waiting until nested events are checked at VM-Entry. This fixes a longstanding bug where KVM fails to handle an exception that occurs during delivery of a previous exception, KVM (L0) and L1 both want to intercept the exception (e.g. #PF for shadow paging), and KVM determines that the exception is in the guest's domain, i.e. queues the new exception for L2. Deferring the interception check causes KVM to esclate various combinations of injected+pending exceptions to double fault (#DF) without consulting L1's interception desires, and ends up injecting a spurious #DF into L2. KVM has fudged around the issue for #PF by special casing emulated #PF injection for shadow paging, but the underlying issue is not unique to shadow paging in L0, e.g. if KVM is intercepting #PF because the guest has a smaller maxphyaddr and L1 (but not L0) is using shadow paging. Other exceptions are affected as well, e.g. if KVM is intercepting #GP for one of SVM's workaround or for the VMware backdoor emulation stuff. The other cases have gone unnoticed because the #DF is spurious if and only if L1 resolves the exception, e.g. KVM's goofs go unnoticed if L1 would have injected #DF anyways. The hack-a-fix has also led to ugly code, e.g. bailing from the emulator if #PF injection forced a nested VM-Exit and the emulator finds itself back in L1. Allowing for direct-to-VM-Exit queueing also neatly solves the async #PF in L2 mess; no need to set a magic flag and token, simply queue a #PF nested VM-Exit. Deal with event migration by flagging that a pending exception was queued by userspace and check for interception at the next KVM_RUN, e.g. so that KVM does the right thing regardless of the order in which userspace restores nested state vs. event state. When "getting" events from userspace, simply drop any pending excpetion that is destined to be intercepted if there is also an injected exception to be migrated. Ideally, KVM would migrate both events, but that would require new ABI, and practically speaking losing the event is unlikely to be noticed, let alone fatal. The injected exception is captured, RIP still points at the original faulting instruction, etc... So either the injection on the target will trigger the same intercepted exception, or the source of the intercepted exception was transient and/or non-deterministic, thus dropping it is ok-ish. Fixes: a04aead144fd ("KVM: nSVM: fix running nested guests when npt=0") Fixes: feaf0c7dc473 ("KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-22-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
72c14e00 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Formalize blocking of nested pending exceptions Capture nested_run_pending as block_pending_exceptions so that the logic of why exceptions are blocked only needs to be documented once instead of at every place that employs the logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-16-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d4963e31 |
|
30-Aug-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Make kvm_queued_exception a properly named, visible struct Move the definition of "struct kvm_queued_exception" out of kvm_vcpu_arch in anticipation of adding a second instance in kvm_vcpu_arch to handle exceptions that occur when vectoring an injected exception and are morphed to VM-Exit instead of leading to #DF. Opportunistically take advantage of the churn to rename "nr" to "vector". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220830231614.3580124-15-seanjc@google.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
02dfc44f |
|
25-Aug-2022 |
Mingwei Zhang <mizhang@google.com> |
KVM: x86: Print guest pgd in kvm_nested_vmenter() Print guest pgd in kvm_nested_vmenter() to enrich the information for tracing. When tdp is enabled, print the value of tdp page table (EPT/NPT); when tdp is disabled, print the value of non-nested CR3. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20220825225755.907001-4-mizhang@google.com [sean: print nested_cr3 vs. nested_eptp vs. guest_cr3] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
89e54ec5 |
|
25-Aug-2022 |
Mingwei Zhang <mizhang@google.com> |
KVM: x86: Update trace function for nested VM entry to support VMX Update trace function for nested VM entry to support VMX. Existing trace function only supports nested VMX and the information printed out is AMD specific. So, rename trace_kvm_nested_vmrun() to trace_kvm_nested_vmenter(), since 'vmenter' is generic. Add a new field 'isa' to recognize Intel and AMD; Update the output to print out VMX/SVM related naming respectively, eg., vmcb vs. vmcs; npt vs. ept. Opportunistically update the call site of trace_kvm_nested_vmenter() to make one line per parameter. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20220825225755.907001-2-mizhang@google.com [sean: align indentation, s/update/rename in changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c33f6f222 |
|
07-Jun-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits checks, into a separate helper, __kvm_is_valid_cr4(), and export only the inner helper to vendor code in order to prevent nested VMX from calling back into vmx_is_valid_cr4() via kvm_is_valid_cr4(). On SVM, this is a nop as SVM doesn't place any additional restrictions on CR4. On VMX, this is also currently a nop, but only because nested VMX is missing checks on reserved CR4 bits for nested VM-Enter. That bug will be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is, but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's restrictions on CR0/CR4. The cleanest and most intuitive way to fix the VMXON bug is to use nested_host_cr{0,4}_valid(). If the CR4 variant routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's restrictions if and only if the vCPU is post-VMXON. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220607213604.3346000-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
da0b93d6 |
|
18-Jul-2022 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
KVM: nSVM: Pull CS.Base from actual VMCB12 for soft int/ex re-injection enter_svm_guest_mode() first calls nested_vmcb02_prepare_control() to copy control fields from VMCB12 to the current VMCB, then nested_vmcb02_prepare_save() to perform a similar copy of the save area. This means that nested_vmcb02_prepare_control() still runs with the previous save area values in the current VMCB so it shouldn't take the L2 guest CS.Base from this area. Explicitly pull CS.Base from the actual VMCB12 instead in enter_svm_guest_mode(). Granted, having a non-zero CS.Base is a very rare thing (and even impossible in 64-bit mode), having it change between nested VMRUNs is probably even rarer, but if it happens it would create a really subtle bug so it's better to fix it upfront. Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <4caa0f67589ae3c22c311ee0e6139496902f2edc.1658159083.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7a8f7c1f |
|
19-May-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: always intercept x2apic msrs As a preparation for x2avic, this patch ensures that x2apic msrs are always intercepted for the nested guest. Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220519102709.24125-11-suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
938c8745 |
|
24-May-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings Add kvm_caps to hold a variety of capabilites and defaults that aren't handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce the amount of boilerplate code required to add a new feature. The vast majority (all?) of the caps interact with vendor code and are written only during initialization, i.e. should be tagged __read_mostly, declared extern in x86.h, and exported. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
159fc6fa |
|
01-May-2022 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
KVM: nSVM: Transparently handle L1 -> L2 NMI re-injection A NMI that L1 wants to inject into its L2 should be directly re-injected, without causing L0 side effects like engaging NMI blocking for L1. It's also worth noting that in this case it is L1 responsibility to track the NMI window status for its L2 guest. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <f894d13501cd48157b3069a4b4c7369575ddb60e.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7e5b5ef8 |
|
01-May-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: SVM: Re-inject INTn instead of retrying the insn on "failure" Re-inject INTn software interrupts instead of retrying the instruction if the CPU encountered an intercepted exception while vectoring the INTn, e.g. if KVM intercepted a #PF when utilizing shadow paging. Retrying the instruction is architecturally wrong e.g. will result in a spurious #DB if there's a code breakpoint on the INT3/O, and lack of re-injection also breaks nested virtualization, e.g. if L1 injects a software interrupt and vectoring the injected interrupt encounters an exception that is intercepted by L0 but not L1. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <1654ad502f860948e4f2d57b8bd881d67301f785.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6ef88d6e |
|
01-May-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction Re-inject INT3/INTO instead of retrying the instruction if the CPU encountered an intercepted exception while vectoring the software exception, e.g. if vectoring INT3 encounters a #PF and KVM is using shadow paging. Retrying the instruction is architecturally wrong, e.g. will result in a spurious #DB if there's a code breakpoint on the INT3/O, and lack of re-injection also breaks nested virtualization, e.g. if L1 injects a software exception and vectoring the injected exception encounters an exception that is intercepted by L0 but not L1. Due to, ahem, deficiencies in the SVM architecture, acquiring the next RIP may require flowing through the emulator even if NRIPS is supported, as the CPU clears next_rip if the VM-Exit is due to an exception other than "exceptions caused by the INT3, INTO, and BOUND instructions". To deal with this, "skip" the instruction to calculate next_rip (if it's not already known), and then unwind the RIP write and any side effects (RFLAGS updates). Save the computed next_rip and use it to re-stuff next_rip if injection doesn't complete. This allows KVM to do the right thing if next_rip was known prior to injection, e.g. if L1 injects a soft event into L2, and there is no backing INTn instruction, e.g. if L1 is injecting an arbitrary event. Note, it's impossible to guarantee architectural correctness given SVM's architectural flaws. E.g. if the guest executes INTn (no KVM injection), an exit occurs while vectoring the INTn, and the guest modifies the code stream while the exit is being handled, KVM will compute the incorrect next_rip due to "skipping" the wrong instruction. A future enhancement to make this less awful would be for KVM to detect that the decoded instruction is not the correct INTn and drop the to-be-injected soft event (retrying is a lesser evil compared to shoving the wrong RIP on the exception stack). Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <65cb88deab40bc1649d509194864312a89bbe02e.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
00f08d99 |
|
01-May-2022 |
Maciej S. Szmigiero <maciej.szmigiero@oracle.com> |
KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02 The next_rip field of a VMCB is *not* an output-only field for a VMRUN. This field value (instead of the saved guest RIP) in used by the CPU for the return address pushed on stack when injecting a software interrupt or INT3 or INTO exception. Make sure this field gets synced from vmcb12 to vmcb02 when entering L2 or loading a nested state and NRIPS is exposed to L1. If NRIPS is supported in hardware but not exposed to L1 (nrips=0 or hidden by userspace), stuff vmcb02's next_rip from the new L2 RIP to emulate a !NRIPS CPU (which saves RIP on the stack as-is). Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <c2e0a3d78db3ae30530f11d4e9254b452a89f42b.1651440202.git.maciej.szmigiero@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e3cdaab5 |
|
31-May-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
11d39e8c |
|
06-Jun-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: SVM: fix tsc scaling cache logic SVM uses a per-cpu variable to cache the current value of the tsc scaling multiplier msr on each cpu. Commit 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") broke this caching logic. Refactor the code so that all TSC scaling multiplier writes go through a single function which checks and updates the cache. This fixes the following scenario: 1. A CPU runs a guest with some tsc scaling ratio. 2. New guest with different tsc scaling ratio starts on this CPU and terminates almost immediately. This ensures that the short running guest had set the tsc scaling ratio just once when it was set via KVM_SET_TSC_KHZ. Due to the bug, the per-cpu cache is not updated. 3. The original guest continues to run, it doesn't restore the msr value back to its own value, because the cache matches, and thus continues to run with a wrong tsc scaling ratio. Fixes: 1ab9287add5e2 ("KVM: X86: Add vendor callbacks for writing the TSC multiplier") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6819af75 |
|
03-Mar-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Clean up and document nested #PF workaround Replace the per-vendor hack-a-fix for KVM's #PF => #PF => #DF workaround with an explicit, common workaround in kvm_inject_emulated_page_fault(). Aside from being a hack, the current approach is brittle and incomplete, e.g. nSVM's KVM_SET_NESTED_STATE fails to set ->inject_page_fault(), and nVMX fails to apply the workaround when VMX is intercepting #PF due to allow_smaller_maxphyaddr=1. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
45846661 |
|
06-Apr-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 Remove WARNs that sanity check that KVM never lets a triple fault for L2 escape and incorrectly end up in L1. In normal operation, the sanity check is perfectly valid, but it incorrectly assumes that it's impossible for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through KVM_RUN (which guarantees kvm_check_nested_state() will see and handle the triple fault). The WARN can currently be triggered if userspace injects a machine check while L2 is active and CR4.MCE=0. And a future fix to allow save/restore of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't lost on migration, will make it trivially easy for userspace to trigger the WARN. Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is tempting, but wrong, especially if/when the request is saved/restored, e.g. if userspace restores events (including a triple fault) and then restores nested state (which may forcibly leave guest mode). Ignoring the fact that KVM doesn't currently provide the necessary APIs, it's userspace's responsibility to manage pending events during save/restore. ------------[ cut here ]------------ WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Modules linked in: kvm_intel kvm irqbypass CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel] Call Trace: <TASK> vmx_leave_nested+0x30/0x40 [kvm_intel] vmx_set_nested_state+0xca/0x3e0 [kvm_intel] kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm] kvm_vcpu_ioctl+0x4b9/0x660 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ---[ end trace 0000000000000000 ]--- Fixes: cb6a32c2b877 ("KVM: x86: Handle triple fault in L2 without killing L1") Cc: stable@vger.kernel.org Cc: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220407002315.78092-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f44509f8 |
|
22-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: SVM: allow AVIC to co-exist with a nested guest running Inhibit the AVIC of the vCPU that is running nested for the duration of the nested run, so that all interrupts arriving from both its vCPU siblings and from KVM are delivered using normal IPIs and cause that vCPU to vmexit. Note that unlike normal AVIC inhibition, there is no need to update the AVIC mmio memslot, because the nested guest uses its own set of paging tables. That also means that AVIC doesn't need to be inhibited VM wide. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0b349662 |
|
22-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: implement nested vGIF In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless of STGI/CLGI intercepts, and since VM entry enables GIF, this means that L1's GIF is always 1 while L2 is running. Thus in this case leave L1's vGIF in vmcb01, while letting L2 control the vGIF thus implementing nested vGIF. Also allow KVM to toggle L1's GIF during nested entry/exit by always using vmcb01. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
74fd41ed |
|
22-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE Expose the pause filtering and threshold in the guest CPUID and support PAUSE filtering when possible: - If the L0 doesn't intercept PAUSE (cpu_pm=on), then allow L1 to have full control over PAUSE filtering. - if the L1 doesn't intercept PAUSE, use host values and update the adaptive count/threshold even when running nested. - Otherwise always exit to L1; it is not really possible to merge the fields correctly. It is expected that in this case, userspace will not enable this feature in the guest CPUID, to avoid having the guest update both fields pointlessly. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d20c796c |
|
22-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: implement nested LBR virtualization This was tested with kvm-unit-test that was developed for this purpose. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-3-mlevitsk@redhat.com> [Copy all of DEBUGCTL except for reserved bits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1d5a1b58 |
|
22-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running When L2 is running without LBR virtualization, we should ensure that L1's LBR msrs continue to update as usual. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322174050.241850-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
db663af4 |
|
22-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense This makes the code a bit shorter and cleaner. No functional change intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220322172449.235575-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b9f3973a |
|
01-Mar-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: implement nested VMLOAD/VMSAVE This was tested by booting L1,L2,L3 (all Linux) and checking that no VMLOAD/VMSAVE vmexits happened. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220301143650.143749-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3cffc89d |
|
04-Feb-2022 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86/mmu: load new PGD after the shadow MMU is initialized Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and shadow_root_level anymore, pull the PGD load after the initialization of the shadow MMUs. Besides being more intuitive, this enables future simplifications and optimizations because it's not necessary anymore to compute the role outside kvm_init_mmu. In particular, kvm_mmu_reset_context was not attempting to use a cached PGD to avoid having to figure out the new role. With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing, and avoid unloading all the cached roots. Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
66c03a92 |
|
02-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Implement Enlightened MSR-Bitmap feature Similar to nVMX commit 502d2bf5f2fd ("KVM: nVMX: Implement Enlightened MSR Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM). Notable differences from nVMX implementation: - As the feature uses SW reserved fields in VMCB control, KVM needs to make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()). - 'msrpm_base_pa' needs to be always be overwritten in nested_svm_vmrun_msrpm(), even when the update is skipped. As an optimization, nested_vmcb02_prepare_control() copies it from VMCB01 so when MSR-Bitmap feature for L2 is disabled nothing needs to be done. - 'struct vmcb_ctrl_area_cached' needs to be extended with clean fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to copy it so nested_svm_vmrun_msrpm() can use it later. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
73c25546 |
|
02-Feb-2022 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt Similar to nVMX commit ed2a4800ae9d ("KVM: nVMX: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep track of whether MSR bitmap for L2 needs to be rebuilt due to changes in MSR bitmap for L1 or switching to a different L2. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220202095100.129834-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e1779c27 |
|
07-Feb-2022 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: fix potential NULL derefernce on nested migration Turns out that due to review feedback and/or rebases I accidentally moved the call to nested_svm_load_cr3 to be too early, before the NPT is enabled, which is very wrong to do. KVM can't even access guest memory at that point as nested NPT is needed for that, and of course it won't initialize the walk_mmu, which is main issue the patch was addressing. Fix this for real. Fixes: 232f75d3b4b5 ("KVM: nSVM: call nested_svm_load_cr3 on nested state load") Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220207155447.840194-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f7e57078 |
|
25-Jan-2022 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Forcibly leave nested virt when SMM state is toggled Forcibly leave nested virtualization operation if userspace toggles SMM state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS. If userspace forces the vCPU out of SMM while it's post-VMXON and then injects an SMI, vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both vmxon=false and smm.vmxon=false, but all other nVMX state allocated. Don't attempt to gracefully handle the transition as (a) most transitions are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't sufficient information to handle all transitions, e.g. SVM wants access to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede KVM_SET_NESTED_STATE during state restore as the latter disallows putting the vCPU into L2 if SMM is active, and disallows tagging the vCPU as being post-VMXON in SMM if SMM is not active. Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU in an architecturally impossible state. WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Modules linked in: CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline] RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656 Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00 Call Trace: <TASK> kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123 kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline] kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460 kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline] kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline] kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250 kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273 __fput+0x286/0x9f0 fs/file_table.c:311 task_work_run+0xdd/0x1a0 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0xb29/0x2a30 kernel/exit.c:806 do_group_exit+0xd2/0x2f0 kernel/exit.c:935 get_signal+0x4b0/0x28c0 kernel/signal.c:2862 arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868 handle_signal_work kernel/entry/common.c:148 [inline] exit_to_user_mode_loop kernel/entry/common.c:172 [inline] exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Cc: stable@vger.kernel.org Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220125220358.2091737-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2df4a5eb |
|
24-Nov-2021 |
Lai Jiangshan <laijs@linux.alibaba.com> |
KVM: X86: Remove mmu parameter from load_pdptrs() It uses vcpu->arch.walk_mmu always; nested EPT does not have PDPTRs, and nested NPT treats them like all other non-leaf page table levels instead of caching them. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211124122055.64424-11-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
aec9c240 |
|
08-Nov-2021 |
Lai Jiangshan <laijs@linux.alibaba.com> |
KVM: SVM: Remove references to VCPU_EXREG_CR3 VCPU_EXREG_CR3 is never cleared from vcpu->arch.regs_avail or vcpu->arch.regs_dirty in SVM; therefore, marking CR3 as available is merely a NOP, and testing it will likewise always succeed. Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com> Message-Id: <20211108124407.12187-9-jiangshanlai@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8fc78909 |
|
03-Nov-2021 |
Emanuele Giuseppe Esposito <eesposit@redhat.com> |
KVM: nSVM: introduce struct vmcb_ctrl_area_cached This structure will replace vmcb_control_area in svm_nested_state, providing only the fields that are actually used by the nested state. This avoids having and copying around uninitialized fields. The cost of this, however, is that all functions (in this case vmcb_is_intercept) expect the old structure, so they need to be duplicated. In addition, in svm_get_nested_state() user space expects a vmcb_control_area struct, so we need to copy back all fields in a temporary structure before copying it to userspace. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211103140527.752797-7-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bd95926c |
|
11-Nov-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: split out __nested_vmcb_check_controls Remove the struct vmcb_control_area parameter from nested_vmcb_check_controls, for consistency with the functions that operate on the save area. This way, VMRUN uses the version without underscores for both areas, while KVM_SET_NESTED_STATE uses the version with underscores. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
355d0473 |
|
03-Nov-2021 |
Emanuele Giuseppe Esposito <eesposit@redhat.com> |
KVM: nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU races Use the already checked svm->nested.save cached fields (EFER, CR0, CR4, ...) instead of vmcb12's in nested_vmcb02_prepare_save(). This prevents from creating TOC/TOU races, since the guest could modify the vmcb12 fields. This also avoids the need of force-setting EFER_SVME in nested_vmcb02_prepare_save. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211103140527.752797-6-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b7a3d8b6 |
|
03-Nov-2021 |
Emanuele Giuseppe Esposito <eesposit@redhat.com> |
KVM: nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs() Now that struct vmcb_save_area_cached contains the required vmcb fields values (done in nested_load_save_from_vmcb12()), check them to see if they are correct in nested_vmcb_valid_sregs(). While at it, rename nested_vmcb_valid_sregs in nested_vmcb_check_save. __nested_vmcb_check_save takes the additional @save parameter, so it is helpful when we want to check a non-svm save state, like in svm_set_nested_state. The reason for that is that save is the L1 state, not L2, so we check it without moving it to svm->nested.save. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211103140527.752797-5-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7907160d |
|
03-Nov-2021 |
Emanuele Giuseppe Esposito <eesposit@redhat.com> |
KVM: nSVM: rename nested_load_control_from_vmcb12 in nested_copy_vmcb_control_to_cache Following the same naming convention of the previous patch, rename nested_load_control_from_vmcb12. In addition, inline copy_vmcb_control_area as it is only called by this function. __nested_copy_vmcb_control_to_cache() works with vmcb_control_area parameters and it will be useful in next patches, when we use local variables instead of svm cached state. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211103140527.752797-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f2740a8d |
|
03-Nov-2021 |
Emanuele Giuseppe Esposito <eesposit@redhat.com> |
KVM: nSVM: introduce svm->nested.save to cache save area before checks This is useful in the next patch, to keep a saved copy of vmcb12 registers and pass it around more easily. Instead of blindly copying everything, we just copy EFER, CR0, CR3, CR4, DR6 and DR7 which are needed by the VMRUN checks. If more fields will need to be checked, it will be quite obvious to see that they must be added in struct vmcb_save_area_cached and in nested_copy_vmcb_save_to_cache(). __nested_copy_vmcb_save_to_cache() takes a vmcb_save_area_cached parameter, which is useful in order to save the state to a local variable. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211103140527.752797-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
907afa48 |
|
03-Nov-2021 |
Emanuele Giuseppe Esposito <eesposit@redhat.com> |
KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in nested_vmcb_valid_sregs Inline nested_vmcb_check_cr3_cr4 as it is not called by anyone else. Doing so simplifies next patches. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211103140527.752797-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
174a921b |
|
20-Sep-2021 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB According to section "TLB Flush" in APM vol 2, "Support for TLB_CONTROL commands other than the first two, is optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid]. All encodings of TLB_CONTROL not defined in the APM are reserved." Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5228eb96 |
|
14-Sep-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: implement nested TSC scaling This was tested by booting a nested guest with TSC=1Ghz, observing the clocks, and doing about 100 cycles of migration. Note that qemu patch is needed to support migration because of a new MSR that needs to be placed in the migration state. The patch will be sent to the qemu mailing list soon. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-14-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0226a45c |
|
14-Sep-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: don't copy pause related settings According to the SDM, the CPU never modifies these settings. It loads them on VM entry and updates an internal copy instead. Also don't load them from the vmcb12 as we don't expose these features to the nested guest yet. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
faf6b755 |
|
14-Sep-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: nSVM: don't copy virt_ext from vmcb12 These field correspond to features that we don't expose yet to L2 While currently there are no CVE worthy features in this field, if AMD adds more features to this field, that could allow guest escapes similar to CVE-2021-3653 and CVE-2021-3656. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e85d3e7b |
|
13-Sep-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs, and MSR bitmap, with former not really needed for SMM as SMM exit code reloads them again from SMRAM'S CR3, and later happens to work since MSR bitmap isn't modified while in SMM. Still it is better to be consistient with VMX. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210913140954.165665-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
db105fab |
|
02-Aug-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: remove useless kvm_clear_*_queue For an event to be in injected state when nested_svm_vmrun executes, it must have come from exitintinfo when svm_complete_interrupts ran: vcpu_enter_guest static_call(kvm_x86_run) -> svm_vcpu_run svm_complete_interrupts // now the event went from "exitintinfo" to "injected" static_call(kvm_x86_handle_exit) -> handle_exit svm_invoke_exit_handler vmrun_interception nested_svm_vmrun However, no event could have been in exitintinfo before a VMRUN vmexit. The code in svm.c is a bit more permissive than the one in vmx.c: if (is_external_interrupt(svm->vmcb->control.exit_int_info) && exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR && exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH && exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI) but in any case, a VMRUN instruction would not even start to execute during an attempted event delivery. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c7dfa400 |
|
19-Jul-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor), then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only possible by making L0 intercept these instructions. Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted, and thus read/write portions of the host physical memory. Fixes: 89c8a4984fc9 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature") Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0f923e07 |
|
14-Jul-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) * Invert the mask of bits that we pick from L2 in nested_vmcb02_prepare_control * Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr This fixes a security issue that allowed a malicious L1 to run L2 with AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled AVIC to read/write the host physical memory at some offsets. Fixes: 3d6368ef580a ("KVM: SVM: Add VMRUN handler") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
feea0136 |
|
13-Jul-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: SVM: tweak warning about enabled AVIC on nested entry It is possible that AVIC was requested to be disabled but not yet disabled, e.g if the nested entry is done right after svm_vcpu_after_set_cpuid. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2bb16bea |
|
19-Jul-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state() Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match 'memcpy(dest, src)' to avoid any confusion. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210719090322.625277-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9a9e7481 |
|
16-Jul-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state() To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state(). Opportunistically add missing braces to 'else' branch in vmload_vmsave_interception(). No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210716144104.465269-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bb00bd9c |
|
27-Jun-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Restore nested control upon leaving SMM If the VM was migrated while in SMM, no nested state was saved/restored, and therefore svm_leave_smm has to load both save and control area of the vmcb12. Save area is already loaded from HSAVE area, so now load the control area as well from the vmcb12. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0a758290 |
|
27-Jun-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Introduce svm_copy_vmrun_state() Separate the code setting non-VMLOAD-VMSAVE state from svm_set_nested_state() into its own function. This is going to be re-used from svm_enter_smm()/svm_leave_smm(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fb79f566 |
|
27-Jun-2021 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN APM states that "The address written to the VM_HSAVE_PA MSR, which holds the address of the page used to save the host state on a VMRUN, must point to a hypervisor-owned page. If this check fails, the WRMSR will fail with a #GP(0) exception. Note that a value of 0 is not considered valid for the VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will fail with a #GP(0) exception." svm_set_msr() already checks that the supplied address is valid, so only check for '0' is missing. Add it to nested_svm_vmrun(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210628104425.391276-3-vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4b639a9f |
|
07-Jul-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: SVM: add module param to control the #SMI interception In theory there are no side effects of not intercepting #SMI, because then #SMI becomes transparent to the OS and the KVM. Plus an observation on recent Zen2 CPUs reveals that these CPUs ignore #SMI interception and never deliver #SMI VMexits. This is also useful to test nested KVM to see that L1 handles #SMIs correctly in case when L1 doesn't intercept #SMI. Finally the default remains the same, the SMI are intercepted by default thus this patch doesn't have any effect unless non default module param value is used. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
616007c8 |
|
22-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Enhance comments for MMU roles and nested transition trickiness Expand the comments for the MMU roles. The interactions with gfn_track PGD reuse in particular are hairy. Regarding PGD reuse, add comments in the nested virtualization flows to call out why kvm_init_mmu() is unconditionally called even when nested TDP is used. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-50-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
16be1d12 |
|
22-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/mmu: Move nested NPT reserved bit calculation into MMU proper Move nested NPT's invocation of reset_shadow_zero_bits_mask() into the MMU proper and unexport said function. Aside from dropping an export, this is a baby step toward eliminating the call entirely by fixing the shadow_root_level confusion. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-19-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
31e96bc6 |
|
22-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Add a comment to document why nNPT uses vmcb01, not vCPU state Add a comment in the nested NPT initialization flow to call out that it intentionally uses vmcb01 instead current vCPU state to get the effective hCR4 and hEFER for L1's NPT context. Note, despite nSVM's efforts to handle the case where vCPU state doesn't reflect L1 state, the MMU may still do the wrong thing due to pulling state from the vCPU instead of the passed in CR0/CR4/EFER values. This will be addressed in future commits. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-16-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
dbc4739b |
|
22-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Fix sizes used to pass around CR0, CR4, and EFER When configuring KVM's MMU, pass CR0 and CR4 as unsigned longs, and EFER as a u64 in various flows (mostly MMU). Passing the params as u32s is functionally ok since all of the affected registers reserve bits 63:32 to zero (enforced by KVM), but it's technically wrong. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210622175739.3610207-15-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c9060662 |
|
09-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Drop pointless @reset_roots from kvm_init_mmu() Remove the @reset_roots param from kvm_init_mmu(), the one user, kvm_mmu_reset_context() has already unloaded the MMU and thus freed and invalidated all roots. This also happens to be why the reset_roots=true paths doesn't leak roots; they're already invalid. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b5129100 |
|
09-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers Drop skip_mmu_sync and skip_tlb_flush from __kvm_mmu_new_pgd() now that all call sites unconditionally skip both the sync and flush. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d2e56019 |
|
09-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Move TLB flushing logic (or lack thereof) to dedicated helper Introduce nested_svm_transition_tlb_flush() and use it force an MMU sync and TLB flush on nSVM VM-Enter and VM-Exit instead of sneaking the logic into the __kvm_mmu_new_pgd() call sites. Add a partial todo list to document issues that need to be addressed before the unconditional sync and flush can be modified to look more like nVMX's logic. In addition to making nSVM's forced flushing more overt (guess who keeps losing track of it), the new helper brings further convergence between nSVM and nVMX, and also sets the stage for dropping the "skip" params from __kvm_mmu_new_pgd(). Cc: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609234235.1244004-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
158a48ec |
|
06-Jun-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: avoid loading PDPTRs after migration when possible if new KVM_*_SREGS2 ioctls are used, the PDPTRs are a part of the migration state and are correctly restored by those ioctls. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-9-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b222b0b8 |
|
06-Jun-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: refactor the CR3 reload on migration Document the actual reason why we need to do it on migration and move the call to svm_set_nested_state to be closer to VMX code. To avoid loading the PDPTRs from possibly not up to date memory map, in nested_svm_load_cr3 after the move, move this code to .get_nested_state_pages. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210607090203.133058-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a36dbec6 |
|
06-Jun-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Drop pointless pdptrs_changed() check on nested transition Remove the "PDPTRs unchanged" check to skip PDPTR loading during nested SVM transitions as it's not at all an optimization. Reading guest memory to get the PDPTRs isn't magically cheaper by doing it in pdptrs_changed(), and if the PDPTRs did change, KVM will end up doing the read twice. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210607090203.133058-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b93af02c |
|
09-Jun-2021 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nVMX: nSVM: 'nested_run' should count guest-entry attempts that make it to guest code Currently, the 'nested_run' statistic counts all guest-entry attempts, including those that fail during vmentry checks on Intel and during consistency checks on AMD. Convert this statistic to count only those guest-entries that make it past these state checks and make it to guest code. This will tell us the number of guest-entries that actually executed or tried to execute guest code. Signed-off-by: Krish Sadhukhan <Krish.Sadhukhan@oracle.com> Message-Id: <20210609180340.104248-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
809c7913 |
|
04-May-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: remove a warning about vmcb01 VM exit reason While in most cases, when returning to use the VMCB01, the exit reason stored in it will be SVM_EXIT_VMRUN, on first VM exit after a nested migration this field can contain anything since the VM entry did happen before the migration. Remove this warning to avoid the false positive. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210504143936.1644378-3-mlevitsk@redhat.com> Fixes: 9a7de6ecc3ed ("KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit()") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
063ab16c |
|
04-May-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: always restore the L1's GIF on migration While usually the L1's GIF is set while L2 runs, and usually migration nested state is loaded after a vCPU reset which also sets L1's GIF to true, this is not guaranteed. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210504143936.1644378-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9d290e16 |
|
03-May-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: leave the guest mode prior to loading a nested state This allows the KVM to load the nested state more than once without warnings. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c74ad08f |
|
03-May-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: fix few bugs in the vmcb02 caching logic * Define and use an invalid GPA (all ones) for init value of last and current nested vmcb physical addresses. * Reset the current vmcb12 gpa to the invalid value when leaving the nested mode, similar to what is done on nested vmexit. * Reset the last seen vmcb12 address when disabling the nested SVM, as it relies on vmcb02 fields which are freed at that point. Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
deee59ba |
|
03-May-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: fix a typo in svm_leave_nested When forcibly leaving the nested mode, we should switch to vmcb01 Fixes: 4995a3685f1b ("KVM: SVM: Use a separate vmcb for the nested L2 guest") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210503125446.1353307-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ee695f22 |
|
12-Apr-2021 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
nSVM: Check addresses of MSR and IO permission maps According to section "Canonicalization and Consistency Checks" in APM vol 2, the following guest state is illegal: "The MSR or IOIO intercept tables extend to a physical address that is greater than or equal to the maximum supported physical address." Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20210412215611.110095-5-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4020da3b |
|
01-Apr-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: x86: pending exceptions must not be blocked by an injected event Injected interrupts/nmi should not block a pending exception, but rather be either lost if nested hypervisor doesn't intercept the pending exception (as in stock x86), or be delivered in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit that corresponds to the pending exception. The only reason for an exception to be blocked is when nested run is pending (and that can't really happen currently but still worth checking for). Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
232f75d3 |
|
01-Apr-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: call nested_svm_load_cr3 on nested state load While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4 by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore only root_mmu is reset. On regular nested entries we call nested_svm_load_cr3 which both updates the guest's CR3 in the MMU when it is needed, and it also initializes the mmu again which makes it initialize the walk_mmu as well when nested paging is enabled in both host and guest. Since we don't call nested_svm_load_cr3 on nested state load, the walk_mmu can be left uninitialized, which can lead to a NULL pointer dereference while accessing it if we happen to get a nested page fault right after entering the nested guest first time after the migration and we decide to emulate it, which leads to the emulator trying to access walk_mmu->gva_to_gpa which is NULL. Therefore we should call this function on nested state load as well. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
eba04b20 |
|
30-Mar-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Account a variety of miscellaneous allocations Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9a7de6ec |
|
23-Mar-2021 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: If VMRUN is single-stepped, queue the #DB intercept in nested_svm_vmexit() According to APM, the #DB intercept for a single-stepped VMRUN must happen after the completion of that instruction, when the guest does #VMEXIT to the host. However, in the current implementation of KVM, the #DB intercept for a single-stepped VMRUN happens after the completion of the instruction that follows the VMRUN instruction. When the #DB intercept handler is invoked, it shows the RIP of the instruction that follows VMRUN, instead of of VMRUN itself. This is an incorrect RIP as far as single-stepping VMRUN is concerned. This patch fixes the problem by checking, in nested_svm_vmexit(), for the condition that the VMRUN instruction is being single-stepped and if so, queues the pending #DB intercept so that the #DB is accounted for before we execute L1's next instruction. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oraacle.com> Message-Id: <20210323175006.73249-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8173396e |
|
01-Mar-2021 |
Cathy Avery <cavery@redhat.com> |
KVM: nSVM: Optimize vmcb12 to vmcb02 save area copies Use the vmcb12 control clean field to determine which vmcb12.save registers were marked dirty in order to minimize register copies when switching from L1 to L2. Those vmcb12 registers marked as dirty need to be copied to L0's vmcb02 as they will be used to update the vmcb state cache for the L2 VMRUN. In the case where we have a different vmcb12 from the last L2 VMRUN all vmcb12.save registers must be copied over to L2.save. Tested: kvm-unit-tests kvm selftests Fedora L1 L2 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20210301200844.2000-1-cavery@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d00b99c5 |
|
17-Feb-2021 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Add support for Virtual SPEC_CTRL Newer AMD processors have a feature to virtualize the use of the SPEC_CTRL MSR. Presence of this feature is indicated via CPUID function 0x8000000A_EDX[20]: GuestSpecCtrl. Hypervisors are not required to enable this feature since it is automatically enabled on processors that support it. A hypervisor may wish to impose speculation controls on guest execution or a guest may want to impose its own speculation controls. Therefore, the processor implements both host and guest versions of SPEC_CTRL. When in host mode, the host SPEC_CTRL value is in effect and writes update only the host version of SPEC_CTRL. On a VMRUN, the processor loads the guest version of SPEC_CTRL from the VMCB. When the guest writes SPEC_CTRL, only the guest version is updated. On a VMEXIT, the guest version is saved into the VMCB and the processor returns to only using the host SPEC_CTRL for speculation control. The guest SPEC_CTRL is located at offset 0x2E0 in the VMCB. The effective SPEC_CTRL setting is the guest SPEC_CTRL setting or'ed with the hypervisor SPEC_CTRL setting. This allows the hypervisor to ensure a minimum SPEC_CTRL if desired. This support also fixes an issue where a guest may sometimes see an inconsistent value for the SPEC_CTRL MSR on processors that support this feature. With the current SPEC_CTRL support, the first write to SPEC_CTRL is intercepted and the virtualized version of the SPEC_CTRL MSR is not updated. When the guest reads back the SPEC_CTRL MSR, it will be 0x0, instead of the actual expected value. There isn’t a security concern here, because the host SPEC_CTRL value is or’ed with the Guest SPEC_CTRL value to generate the effective SPEC_CTRL value. KVM writes with the guest's virtualized SPEC_CTRL value to SPEC_CTRL MSR just before the VMRUN, so it will always have the actual value even though it doesn’t appear that way in the guest. The guest will only see the proper value for the SPEC_CTRL register if the guest was to write to the SPEC_CTRL register again. With Virtual SPEC_CTRL support, the save area spec_ctrl is properly saved and restored. So, the guest will always see the proper value when it is read back. Signed-off-by: Babu Moger <babu.moger@amd.com> Message-Id: <161188100955.28787.11816849358413330720.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cc3ed80a |
|
10-Feb-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state This allows to avoid copying of these fields between vmcb01 and vmcb02 on nested guest entry/exit. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3a87c7e0 |
|
02-Mar-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Add helper to synthesize nested VM-Exit without collateral Add a helper to consolidate boilerplate for nested VM-Exits that don't provide any data in exit_info_*. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cb6a32c2 |
|
02-Mar-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Handle triple fault in L2 without killing L1 Synthesize a nested VM-Exit if L2 triggers an emulated triple fault instead of exiting to userspace, which likely will kill L1. Any flow that does KVM_REQ_TRIPLE_FAULT is suspect, but the most common scenario for L2 killing L1 is if L0 (KVM) intercepts a contributory exception that is _not_intercepted by L1. E.g. if KVM is intercepting #GPs for the VMware backdoor, a #GP that occurs in L2 while vectoring an injected #DF will cause KVM to emulate triple fault. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210302174515.2812275-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
63129754 |
|
02-Mar-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: Pass struct kvm_vcpu to exit handlers (and many, many other places) Refactor the svm_exit_handlers API to pass @vcpu instead of @svm to allow directly invoking common x86 exit handlers (in a future patch). Opportunistically convert an absurd number of instances of 'svm->vcpu' to direct uses of 'vcpu' to avoid pointless casting. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210205005750.3841462-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
11f0cbf0 |
|
03-Feb-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Trace VM-Enter consistency check failures Use trace_kvm_nested_vmenter_failed() and its macro magic to trace consistency check failures on nested VMRUN. Tracing such failures by running the buggy VMM as a KVM guest is often the only way to get a precise explanation of why VMRUN failed. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-13-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6906e06d |
|
06-Oct-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: Add missing checks for reserved bits to svm_set_nested_state() The path for SVM_SET_NESTED_STATE needs to have the same checks for the CPU registers, as we have in the VMRUN path for a nested guest. This patch adds those missing checks to svm_set_nested_state(). Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20201006190654.32305-3-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c08f390a |
|
17-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: only copy L1 non-VMLOAD/VMSAVE data in svm_set_nested_state() The VMLOAD/VMSAVE data is not taken from userspace, since it will not be restored on VMEXIT (it will be copied from VMCB02 to VMCB01). For clarity, replace the wholesale copy of the VMCB save area with a copy of that state only. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4bb170a5 |
|
16-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: do not mark all VMCB02 fields dirty on nested vmexit Since L1 and L2 now use different VMCBs, most of the fields remain the same in VMCB02 from one L2 run to the next. Since KVM itself is not looking at VMCB12's clean field, for now not much can be optimized. However, in the future we could avoid more copies if the VMCB12's SEG and DT sections are clean. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7ca62d13 |
|
16-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: do not mark all VMCB01 fields dirty on nested vmexit Since L1 and L2 now use different VMCBs, most of the fields remain the same from one L1 run to the next. svm_set_cr0 and other functions called by nested_svm_vmexit already take care of clearing the corresponding clean bits; only the TSC offset is special. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7c3ecfcd |
|
16-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: do not copy vmcb01->control blindly to vmcb02->control Most fields were going to be overwritten by vmcb12 control fields, or do not matter at all because they are filled by the processor on vmexit. Therefore, we need not copy them from vmcb01 to vmcb02 on vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9e8f0fbf |
|
17-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: rename functions and variables according to vmcbXY nomenclature Now that SVM is using a separate vmcb01 and vmcb02 (and also uses the vmcb12 naming) we can give clearer names to functions that write to and read from those VMCBs. Likewise, variables and parameters can be renamed from nested_vmcb to vmcb12. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4995a368 |
|
13-Jan-2021 |
Cathy Avery <cavery@redhat.com> |
KVM: SVM: Use a separate vmcb for the nested L2 guest svm->vmcb will now point to a separate vmcb for L1 (not nested) or L2 (nested). The main advantages are removing get_host_vmcb and hsave, in favor of concepts that are shared with VMX. We don't need anymore to stash the L1 registers in hsave while L2 runs, but we need to copy the VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. This more or less has the same cost, but code-wise nested_svm_vmloadsave can be reused. This patch omits several optimizations that are possible: - for simplicity there is some wholesale copying of vmcb.control areas which can go away. - we should be able to better use the VMCB01 and VMCB02 clean bits. - another possibility is to always use VMCB01 for VMLOAD and VMSAVE, thus avoiding the copy of VMLOAD/VMSAVE registers from VMCB01 to VMCB02 and back. Tested: kvm-unit-tests kvm self tests Loaded fedora nested guest on fedora Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20201011184818.3609-3-cavery@redhat.com> [Fix conflicts; keep VMCB02 G_PAT up to date whenever guest writes the PAT MSR; do not copy CR4 over from VMCB01 as it is not needed anymore; add a few more comments. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
43c11d91 |
|
05-Mar-2021 |
Dongli Zhang <dongli.zhang@oracle.com> |
KVM: x86: to track if L1 is running L2 VM The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM is running or used to run L2 VM. An example of the usage of 'nested_run' is to help the host administrator to easily track if any L1 VM is used to run L2 VM. Suppose there is issue that may happen with nested virtualization, the administrator will be able to easily narrow down and confirm if the issue is due to nested virtualization via 'nested_run'. For example, whether the fix like commit 88dddc11a8d6 ("KVM: nVMX: do not use dangling shadow VMCS after guest reset") is required. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3c346c0c |
|
31-Mar-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a58d9166 |
|
31-Mar-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: load control fields from VMCB12 before checking them Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d2df592f |
|
18-Feb-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: prepare guest save area while is_guest_mode is true Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and nested_prepare_vmcb_control. This results in is_guest_mode being false until the end of nested_prepare_vmcb_control. This is a problem because nested_prepare_vmcb_save can in turn cause changes to the intercepts and these have to be applied to the "host VMCB" (stored in svm->nested.hsave) and then merged with the VMCB12 intercepts into svm->vmcb. In particular, without this change we forget to set the CR0 read and CR0 write intercepts when running a real mode L2 guest with NPT disabled. The guest is therefore able to see the CR0.PG bit that KVM sets to enable "paged real mode". This patch fixes the svm.flat mode_switch test case with npt=0. There are no other problematic calls in nested_prepare_vmcb_save. Moving is_guest_mode to the end is done since commit 06fc7772690d ("KVM: SVM: Activate nested state only when guest state is complete", 2010-04-25). However, back then KVM didn't grab a different VMCB when updating the intercepts, it had already copied/merged L1's stuff to L0's VMCB, and then updated L0's VMCB regardless of is_nested(). Later recalc_intercepts was introduced in commit 384c63684397 ("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12). This introduced the bug, because recalc_intercepts now throws away the intercept manipulations that svm_set_cr0 had done in the meanwhile to svm->vmcb. [1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/ Reviewed-by: Sean Christopherson <seanjc@google.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a04aead1 |
|
18-Feb-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: fix running nested guests when npt=0 In case of npt=0 on host, nSVM needs the same .inject_page_fault tweak as VMX has, to make sure that shadow mmu faults are injected as vmexits. It is not clear why this is needed at all, but for now keep the same code as VMX and we'll fix it for both. Based on a patch by Maxim Levitsky <mlevitsk@redhat.com>. Fixes: 7c86663b68ba ("KVM: nSVM: inject exceptions via svm_check_nested_events") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
954f419b |
|
17-Feb-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: move nested vmrun tracepoint to enter_svm_guest_mode This way trace will capture all the nested mode entries (including entries after migration, and from smm) Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210217145718.1217358-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ca29e145 |
|
03-Feb-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: SEV: Treat C-bit as legal GPA bit regardless of vCPU mode Rename cr3_lm_rsvd_bits to reserved_gpa_bits, and use it for all GPA legality checks. AMD's APM states: If the C-bit is an address bit, this bit is masked from the guest physical address when it is translated through the nested page tables. Thus, any access that can conceivably be run through NPT should ignore the C-bit when checking for validity. For features that KVM emulates in software, e.g. MTRRs, there is no clear direction in the APM for how the C-bit should be handled. For such cases, follow the SME behavior inasmuch as possible, since SEV is is essentially a VM-specific variant of SME. For SME, the APM states: In this case the upper physical address bits are treated as reserved when the feature is enabled except where otherwise indicated. Collecting the various relavant SME snippets in the APM and cross- referencing the omissions with Linux kernel code, this leaves MTTRs and APIC_BASE as the only flows that KVM emulates that should _not_ ignore the C-bit. Note, this means the reserved bit checks in the page tables are technically broken. This will be remedied in a future patch. Although the page table checks are technically broken, in practice, it's all but guaranteed to be irrelevant. NPT is required for SEV, i.e. shadowing page tables isn't needed in the common case. Theoretically, the checks could be in play for nested NPT, but it's extremely unlikely that anyone is running nested VMs on SEV, as doing so would require L1 to expose sensitive data to L0, e.g. the entire VMCB. And if anyone is running nested VMs, L0 can't read the guest's encrypted memory, i.e. L1 would need to put its NPT in shared memory, in which case the C-bit will never be set. Or, L1 could use shadow paging, but again, if L0 needs to read page tables, e.g. to load PDPTRs, the memory can't be encrypted if L1 has any expectation of L0 doing the right thing. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bbc2c63d |
|
03-Feb-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Use common GPA helper to check for illegal CR3 Replace an open coded check for an invalid CR3 with its equivalent helper. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2732be90 |
|
03-Feb-2021 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Don't strip host's C-bit from guest's CR3 when reading PDPTRs Don't clear the SME C-bit when reading a guest PDPTR, as the GPA (CR3) is in the guest domain. Barring a bizarre paravirtual use case, this is likely a benign bug. SME is not emulated by KVM, loading SEV guest PDPTRs is doomed as KVM can't use the correct key to read guest memory, and setting guest MAXPHYADDR higher than the host, i.e. overlapping the C-bit, would cause faults in the guest. Note, for SEV guests, stripping the C-bit is technically aligned with CPU behavior, but for KVM it's the greater of two evils. Because KVM doesn't have access to the guest's encryption key, ignoring the C-bit would at best result in KVM reading garbage. By keeping the C-bit, KVM will fail its read (unless userspace creates a memslot with the C-bit set). The guest will still undoubtedly die, as KVM will use '0' for the PDPTR value, but that's preferable to interpreting encrypted data as a PDPTR. Fixes: d0ec49d4de90 ("kvm/x86/svm: Support Secure Memory Encryption within KVM") Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210204000117.3303214-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9a3ecd5e |
|
02-Feb-2021 |
Chenyi Qiang <chenyi.qiang@intel.com> |
KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c1c35cf7 |
|
13-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: cleanup CR3 reserved bits checks If not in long mode, the low bits of CR3 are reserved but not enforced to be zero, so remove those checks. If in long mode, however, the MBZ bits extend down to the highest physical address bit of the guest, excluding the encryption bit. Make the checks consistent with the above, and match them between nested_vmcb_checks and KVM_SET_SREGS. Cc: stable@vger.kernel.org Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests") Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9a78e158 |
|
08-Jan-2021 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS, which may need to be loaded outside guest mode. Therefore we cannot WARN in that case. However, that part of nested_get_vmcs12_pages is _not_ needed at vmentry time. Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling, so that both vmentry and migration (and in the latter case, independent of is_guest_mode) do the parts that are needed. Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3ba: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES Cc: <stable@vger.kernel.org> # 5.10.x Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f2c7ef3b |
|
07-Jan-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit It is possible to exit the nested guest mode, entered by svm_set_nested_state prior to first vm entry to it (e.g due to pending event) if the nested run was not pending during the migration. In this case we must not switch to the nested msr permission bitmap. Also add a warning to catch similar cases in the future. Fixes: a7d5c7ce41ac1 ("KVM: nSVM: delay MSR permission processing to first nested VM run") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
56fe28de |
|
07-Jan-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode We overwrite most of vmcb fields while doing so, so we must mark it as dirty. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
81f76ada |
|
07-Jan-2021 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: correctly restore nested_run_pending on migration The code to store it on the migration exists, but no code was restoring it. One of the side effects of fixing this is that L1->L2 injected events are no longer lost when migration happens with nested run pending. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
8cce12b3 |
|
26-Nov-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: set fixed bits by hand SVM generally ignores fixed-1 bits. Set them manually so that we do not end up by mistake without those bits set in struct kvm_vcpu; it is part of userspace API that KVM always returns value with the bits set. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ee69c92b |
|
06-Oct-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Return bool instead of int for CR4 and SREGS validity checks Rework the common CR4 and SREGS checks to return a bool instead of an int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to clarify the polarity of the return value (which is effectively inverted by this change). No functional changed intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-6-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2fcf4876 |
|
01-Oct-2020 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: implement on demand allocation of the nested state This way we don't waste memory on VMs which don't use nesting virtualization even when the host enabled it for them. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a7d5c7ce |
|
22-Sep-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: delay MSR permission processing to first nested VM run Allow userspace to set up the memory map after KVM_SET_NESTED_STATE; to do so, move the call to nested_svm_vmrun_msrpm inside the KVM_REQ_GET_NESTED_STATE_PAGES handler (which is currently not used by nSVM). This is similar to what VMX does already. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fb0f33fd |
|
28-Aug-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: CR3 MBZ bits are only 63:52 Commit 761e4169346553c180bbd4a383aedd72f905bc9a created a wrong mask for the CR3 MBZ bits. According to APM vol 2, only the upper 12 bits are MBZ. Fixes: 761e41693465 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests", 2020-07-08) Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200829004824.4577-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4c44e8d6 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Add new intercept word in vmcb_control_area The new intercept bits have been added in vmcb control area to support few more interceptions. Here are the some of them. - INTERCEPT_INVLPGB, - INTERCEPT_INVLPGB_ILLEGAL, - INTERCEPT_INVPCID, - INTERCEPT_MCOMMIT, - INTERCEPT_TLBSYNC, Add a new intercept word in vmcb_control_area to support these instructions. Also update kvm_nested_vmrun trace function to support the new addition. AMD documentation for these instructions is available at "AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Pub. 24593 Rev. 3.34(or later)" The documentation can be obtained at the links below: Link: https://www.amd.com/system/files/TechDocs/24593.pdf Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985251547.11252.16994139329949066945.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c62e2e94 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Modify 64 bit intercept field to two 32 bit vectors Convert all the intercepts to one array of 32 bit vectors in vmcb_control_area. This makes it easy for future intercept vector additions. Also update trace functions. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250813.11252.5736581193881040525.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9780d51d |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Modify intercept_exceptions to generic intercepts Modify intercept_exceptions to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_exceptions bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985250037.11252.1361972528657052410.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
30abaa88 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Change intercept_dr to generic intercepts Modify intercept_dr to generic intercepts in vmcb_control_area. Use the generic vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept to set/clear/test the intercept_dr bits. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985249255.11252.10000868032136333355.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
03bfeeb9 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Change intercept_cr to generic intercepts Change intercept_cr to generic intercepts in vmcb_control_area. Use the new vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept where applicable. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985248506.11252.9081085950784508671.stgit@bmoger-ubuntu> [Change constant names. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c45ad722 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: SVM: Introduce vmcb_(set_intercept/clr_intercept/_is_intercept) This is in preparation for the future intercept vector additions. Add new functions vmcb_set_intercept, vmcb_clr_intercept and vmcb_is_intercept using kernel APIs __set_bit, __clear_bit and test_bit espectively. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985247876.11252.16039238014239824460.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a90c1ed9 |
|
11-Sep-2020 |
Babu Moger <babu.moger@amd.com> |
KVM: nSVM: Remove unused field host_intercept_exceptions is not used anywhere. Clean it up. Signed-off-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Jim Mattson <jmattson@google.com> Message-Id: <159985252277.11252.8819848322175521354.stgit@bmoger-ubuntu> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0dd16b5b |
|
27-Aug-2020 |
Maxim Levitsky <mlevitsk@redhat.com> |
KVM: nSVM: rename nested vmcb to vmcb12 This is to be more consistient with VMX, and to support upcoming addition of vmcb02 Hopefully no functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827171145.374620-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d5cd6f34 |
|
14-Sep-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state() The save and ctl pointers are passed uninitialized to kfree() when svm_set_nested_state() follows the 'goto out_set_gif' path. While the issue could've been fixed by initializing these on-stack varialbles to NULL, it seems preferable to eliminate 'out_set_gif' label completely as it is not actually a failure path and duplicating a single svm_set_gif() call doesn't look too bad. [ bp: Drop obscure Addresses-Coverity: tag. ] Fixes: 6ccbd29ade0d ("KVM: SVM: nested: Don't allocate VMCB structures on stack") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Joerg Roedel <jroedel@suse.de> Reported-by: Colin King <colin.king@canonical.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Joerg Roedel <jroedel@suse.de> Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
772b81bb |
|
27-Aug-2020 |
Maxim Levitsky <mlevitsk@redhat.com> |
SVM: nSVM: setup nested msr permission bitmap on nested state load This code was missing and was forcing the L2 run with L1's msr permission bitmap Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9883764a |
|
27-Aug-2020 |
Maxim Levitsky <mlevitsk@redhat.com> |
SVM: nSVM: correctly restore GIF on vmexit from nesting after migration Currently code in svm_set_nested_state copies the current vmcb control area to L1 control area (hsave->control), under assumption that it mostly reflects the defaults that kvm choose, and later qemu overrides these defaults with L2 state using standard KVM interfaces, like KVM_SET_REGS. However nested GIF (which is AMD specific thing) is by default is true, and it is copied to hsave area as such. This alone is not a big deal since on VMexit, GIF is always set to false, regardless of what it was on VM entry. However in nested_svm_vmexit we were first were setting GIF to false, but then we overwrite the control fields with value from the hsave area. (including the nested GIF field itself if GIF virtualization is enabled). Now on normal vm entry this is not a problem, since GIF is usually false prior to normal vm entry, and this is the value that copied to hsave, and then restored, but this is not always the case when the nested state is loaded as explained above. To fix this issue, move svm_set_gif after we restore the L1 control state in nested_svm_vmexit, so that even with wrong GIF in the saved L1 control area, we still clear GIF as the spec says. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6ccbd29a |
|
07-Sep-2020 |
Joerg Roedel <jroedel@suse.de> |
KVM: SVM: nested: Don't allocate VMCB structures on stack Do not allocate a vmcb_control_area and a vmcb_save_area on the stack, as these structures will become larger with future extenstions of SVM and thus the svm_set_nested_state() function will become a too large stack frame. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-2-joro@8bytes.org
|
#
096586fd |
|
15-Jul-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: nSVM: Correctly set the shadow NPT root level in its MMU role Move the initialization of shadow NPT MMU's shadow_root_level into kvm_init_shadow_npt_mmu() and explicitly set the level in the shadow NPT MMU's role to be the TDP level. This ensures the role and MMU levels are synchronized and also initialized before __kvm_mmu_new_pgd(), which consumes the level when attempting a fast PGD switch. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Fixes: 9fa72119b24db ("kvm: x86: Introduce kvm_mmu_calc_root_page_role()") Fixes: a506fdd223426 ("KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switch") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200716034122.5998-2-sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e8af9e9f |
|
10-Jul-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF The "if" that drops the present bit from the page structure fauls makes no sense. It was added by yours truly in order to be bug-compatible with pre-existing code and in order to make the tests pass; however, the tests are wrong. The behavior after this patch matches bare metal. Reported-by: Nadav Amit <namit@vmware.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d82aaef9 |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: use nested_svm_load_cr3() on guest->host switch Make nSVM code resemble nVMX where nested_vmx_load_cr3() is used on both guest->host and host->guest transitions. Also, we can now eliminate unconditional kvm_mmu_reset_context() and speed things up. Note, nVMX has two different paths: load_vmcs12_host_state() and nested_vmx_restore_host_state() and the later is used to restore from 'partial' switch to L2, it always uses kvm_mmu_reset_context(). nSVM doesn't have this yet. Also, nested_svm_vmexit()'s return value is almost always ignored nowadays. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-9-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a506fdd2 |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switch Undesired triple fault gets injected to L1 guest on SVM when L2 is launched with certain CR3 values. #TF is raised by mmu_check_root() check in fast_pgd_switch() and the root cause is that when kvm_set_cr3() is called from nested_prepare_vmcb_save() with NPT enabled CR3 points to a nGPA so we can't check it with kvm_is_visible_gfn(). Using generic kvm_set_cr3() when switching to nested guest is not a great idea as we'll have to distinguish between 'real' CR3s and 'nested' CR3s to e.g. not call kvm_mmu_new_pgd() with nGPA. Following nVMX implement nested-specific nested_svm_load_cr3() doing the job. To support the change, nested_svm_load_cr3() needs to be re-ordered with nested_svm_init_mmu_context(). Note: the current implementation is sub-optimal as we always do TLB flush/MMU sync but this is still an improvement as we at least stop doing kvm_mmu_reset_context(). Fixes: 7c390d350f8b ("kvm: x86: Add fast CR3 switch code path") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-8-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bf7dea42 |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: move kvm_set_cr3() after nested_svm_uninit_mmu_context() kvm_mmu_new_pgd() refers to arch.mmu and at this point it still references arch.guest_mmu while arch.root_mmu is expected. Note, the change is effectively a nop: when !npt_enabled, nested_svm_uninit_mmu_context() does nothing (as we don't do nested_svm_init_mmu_context()) and with npt_enabled we don't do kvm_set_cr3(). However, it will matter when we move the call to kvm_mmu_new_pgd into nested_svm_load_cr3(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-7-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
62156f6c |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: introduce nested_svm_load_cr3()/nested_npt_enabled() As a preparatory change for implementing nSVM-specific PGD switch (following nVMX' nested_vmx_load_cr3()), introduce nested_svm_load_cr3() instead of relying on kvm_set_cr3(). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-6-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
59cd9bc5 |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: prepare to handle errors from enter_svm_guest_mode() Some operations in enter_svm_guest_mode() may fail, e.g. currently we suppress kvm_set_cr3() return value. Prepare the code to proparate errors. No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-5-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ebdb3dba |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: reset nested_run_pending upon nested_svm_vmrun_msrpm() failure WARN_ON_ONCE(svm->nested.nested_run_pending) in nested_svm_vmexit() will fire if nested_run_pending remains '1' but it doesn't really need to, we are already failing and not going to run nested guest. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-4-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
0f04a2ac |
|
10-Jul-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu() As a preparatory change for moving kvm_mmu_new_pgd() from nested_prepare_vmcb_save() to nested_svm_init_mmu_context() split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu(). This also makes the code look more like nVMX (kvm_init_shadow_ept_mmu()). No functional change intended. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200710141157.1640173-2-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
761e4169 |
|
07-Jul-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests According to section "Canonicalization and Consistency Checks" in APM vol. 2 the following guest state is illegal: "Any MBZ bit of CR3 is set." "Any MBZ bit of CR4 is set." Suggeted-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <1594168797-29444-3-git-send-email-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a284ba56 |
|
25-Jun-2020 |
Joerg Roedel <jroedel@suse.de> |
KVM: SVM: Add svm_ prefix to set/clr/is_intercept() Make clear the symbols belong to the SVM code when they are built-in. No functional changes. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200625080325.28439-4-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
06e7852c |
|
25-Jun-2020 |
Joerg Roedel <jroedel@suse.de> |
KVM: SVM: Add vmcb_ prefix to mark_*() functions Make it more clear what data structure these functions operate on. No functional changes. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200625080325.28439-3-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
1aef8161 |
|
22-May-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: Check that DR6[63:32] and DR7[64:32] are not set on vmrun of nested guests According to section "Canonicalization and Consistency Checks" in APM vol. 2 the following guest state is illegal: "DR6[63:32] are not zero." "DR7[63:32] are not zero." "Any MBZ bit of EFER is set." Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200522221954.32131-3-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fb7333df |
|
08-Jun-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: fix calls to is_intercept is_intercept takes an INTERCEPT_* constant, not SVM_EXIT_*; because of this, the compiler was removing the body of the conditionals, as if is_intercept returned 0. This unveils a latent bug: when clearing the VINTR intercept, int_ctl must also be changed in the L1 VMCB (svm->nested.hsave), just like the intercept itself is also changed in the L1 VMCB. Otherwise V_IRQ remains set and, due to the VINTR intercept being clear, we get a spurious injection of a vector 0 interrupt on the next L2->L1 vmexit. Reported-by: Qian Cai <cai@lca.pw> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
68fd66f1 |
|
25-May-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Currently, APF mechanism relies on the #PF abuse where the token is being passed through CR2. If we switch to using interrupts to deliver page-ready notifications we need a different way to pass the data. Extent the existing 'struct kvm_vcpu_pv_apf_data' with token information for page-ready notifications. While on it, rename 'reason' to 'flags'. This doesn't change the semantics as we only have reasons '1' and '2' and these can be treated as bit flags but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery making 'reason' name misleading. The newly introduced apf_put_user_ready() temporary puts both flags and token information, this will be changed to put token only when we switch to interrupt based notifications. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200525144125.143875-3-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
cc440cda |
|
13-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: implement KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE Similar to VMX, the state that is captured through the currently available IOCTLs is a mix of L1 and L2 state, dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. In particular, the SVM-specific state for nested virtualization includes the L1 saved state (including the interrupt flag), the cached L2 controls, and the GIF. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
929d1cfa |
|
19-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu This allows fetching the registers from the hsave area when setting up the NPT shadow MMU, and is needed for KVM_SET_NESTED_STATE (which runs long after the CR0, CR4 and EFER values in vcpu have been switched to hold L2 guest state). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
c513f484 |
|
18-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: leave guest mode when clearing EFER.SVME According to the AMD manual, the effect of turning off EFER.SVME while a guest is running is undefined. We make it leave guest mode immediately, similar to the effect of clearing the VMX bit in MSR_IA32_FEAT_CTL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ca46d739 |
|
18-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: split nested_vmcb_check_controls The authoritative state does not come from the VMCB once in guest mode, but KVM_SET_NESTED_STATE can still perform checks on L1's provided SVM controls because we get them from userspace. Therefore, split out a function to do them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
08245e6d |
|
19-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: remove HF_HIF_MASK The L1 flags can be found in the save area of svm->nested.hsave, fish it from there so that there is one fewer thing to migrate. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e9fd761a |
|
13-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: remove HF_VINTR_MASK Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can use it instead of vcpu->arch.hflags to check whether L2 is running in V_INTR_MASKING mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
36e2e983 |
|
22-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: synthesize correct EXITINTINFO on vmexit This bit was added to nested VMX right when nested_run_pending was introduced, but it is not yet there in nSVM. Since we can have pending events that L0 injected directly into L2 on vmentry, we have to transfer them into L1's queue. For this to work, one important change is required: svm_complete_interrupts (which clears the "injected" fields from the previous VMRUN, and updates them from svm->vmcb's EXITINTINFO) must be placed before we inject the vmexit. This is not too scary though; VMX even does it in vmx_vcpu_run. While at it, the nested_vmexit_inject tracepoint is moved towards the end of nested_svm_vmexit. This ensures that the synthesized EXITINTINFO is visible in the trace. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
91b7130c |
|
21-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: preserve VGIF across VMCB switch There is only one GIF flag for the whole processor, so make sure it is not clobbered when switching to L2 (in which case we also have to include the V_GIF_ENABLE_MASK, lest we confuse enable_gif/disable_gif/gif_set). When going back, L1 could in theory have entered L2 without issuing a CLGI so make sure the svm_set_gif is done last, after svm->vmcb->control.int_ctl has been copied back from hsave. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
ffdf7f9e |
|
21-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: extract svm_set_gif Extract the code that is needed to implement CLGI and STGI, so that we can run it from VMRUN and vmexit (and in the future, KVM_SET_NESTED_STATE). Skip the request for KVM_REQ_EVENT unless needed, subsuming the evaluate_pending_interrupts optimization that is found in enter_svm_guest_mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2d8a42be |
|
22-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: synchronize VMCB controls updated by the processor on every vmexit The control state changes on every L2->L0 vmexit, and we will have to serialize it in the nested state. So keep it up to date in svm->nested.ctl and just copy them back to the nested VMCB in nested_svm_vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e670bf68 |
|
13-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: save all control fields in svm->nested In preparation for nested SVM save/restore, store all data that matters from the VMCB control area into svm->nested. It will then become part of the nested SVM state that is saved by KVM_SET_NESTED_STATE and restored by KVM_GET_NESTED_STATE, just like the cached vmcs12 for nVMX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2f675917 |
|
18-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: pass vmcb_control_area to copy_vmcb_control_area This will come in handy when we put a struct vmcb_control_area in svm->nested. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
18fc6c55 |
|
18-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: clean up tsc_offset update Use l1_tsc_offset to compute svm->vcpu.arch.tsc_offset and svm->vmcb->control.tsc_offset, instead of relying on hsave. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
69cb8774 |
|
22-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: move MMU setup to nested_prepare_vmcb_control Everything that is needed during nested state restore is now part of nested_prepare_vmcb_control. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f241d711 |
|
18-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: extract preparation of VMCB for nested run Split out filling svm->vmcb.save and svm->vmcb.control before VMRUN. Only the latter will be useful when restoring nested SVM state. This patch introduces no semantic change, so the MMU setup is still done in nested_prepare_vmcb_save. The next patch will clean up things. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
3e06f016 |
|
13-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: extract load_nested_vmcb_control When restoring SVM nested state, the control state cache in svm->nested will have to be filled, but the save state will not have to be moved into svm->vmcb. Therefore, pull the code that handles the control area into a separate function. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
69c9dfa2 |
|
12-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: move map argument out of enter_svm_guest_mode Unmapping the nested VMCB in enter_svm_guest_mode is a bit of a wart, since the map argument is not used elsewhere in the function. There are just two callers, and those are also the place where kvm_vcpu_map is called, so it is cleaner to unmap there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
978ce583 |
|
20-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: always update CR3 in VMCB svm_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as an optimization, but this is only correct before the nested vmentry. If userspace is modifying CR3 with KVM_SET_SREGS after the VM has already been put in guest mode, the value of CR3 will not be updated. Remove the optimization, which almost never triggers anyway. This was was added in commit 689f3bf21628 ("KVM: x86: unify callbacks to load paging root", 2020-03-16) just to keep the two vendor-specific modules closer, but we'll fix VMX too. Fixes: 689f3bf21628 ("KVM: x86: unify callbacks to load paging root") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5b672408 |
|
16-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: correctly inject INIT vmexits The usual drill at this point, except there is no code to remove because this case was not handled at all. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bd279629 |
|
16-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: remove exit_required All events now inject vmexits before vmentry rather than after vmexit. Therefore, exit_required is not set anymore and we can remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7c86663b |
|
16-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: inject exceptions via svm_check_nested_events This allows exceptions injected by the emulator to be properly delivered as vmexits. The code also becomes simpler, because we can just let all L0-intercepted exceptions go through the usual path. In particular, our emulation of the VMX #DB exit qualification is very much simplified, because the vmexit injection path can use kvm_deliver_exception_payload to update DR6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
b6162e82 |
|
27-May-2020 |
Vitaly Kuznetsov <vkuznets@redhat.com> |
KVM: nSVM: Preserve registers modifications done before nested_svm_vmexit() L2 guest hang is observed after 'exit_required' was dropped and nSVM switched to check_nested_events() completely. The hang is a busy loop when e.g. KVM is emulating an instruction (e.g. L2 is accessing MMIO space and we drop to userspace). After nested_svm_vmexit() and when L1 is doing VMRUN nested guest's RIP is not advanced so KVM goes into emulating the same instruction which caused nested_svm_vmexit() and the loop continues. nested_svm_vmexit() is not new, however, with check_nested_events() we're now calling it later than before. In case by that time KVM has modified register state we may pick stale values from VMCB when trying to save nested guest state to nested VMCB. nVMX code handles this case correctly: sync_vmcs02_to_vmcs12() called from nested_vmx_vmexit() does e.g 'vmcs12->guest_rip = kvm_rip_read(vcpu)' and this ensures KVM-made modifications are preserved. Do the same for nSVM. Generally, nested_vmx_vmexit()/nested_svm_vmexit() need to pick up all nested guest state modifications done by KVM after vmexit. It would be great to find a way to express this in a way which would not require to manually track these changes, e.g. nested_{vmcb,vmcs}_get_field(). Co-debugged-with: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200527090102.220647-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6c0238c4 |
|
20-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: leave ASID aside in copy_vmcb_control_area Restoring the ASID from the hsave area on VMEXIT is wrong, because its value depends on the handling of TLB flushes. Just skipping the field in copy_vmcb_control_area will do. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
a3535be7 |
|
16-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: fix condition for filtering async PF Async page faults have to be trapped in the host (L1 in this case), since the APF reason was passed from L0 to L1 and stored in the L1 APF data page. This was completely reversed: the page faults were passed to the guest, a L2 hypervisor. Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
e93fd3b3 |
|
01-May-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86/mmu: Capture TDP level when updating CPUID Snapshot the TDP level now that it's invariant (SVM) or dependent only on host capabilities and guest CPUID (VMX). This avoids having to call kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or calculating the page role, and thus avoids the associated retpoline. Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is active is legal, if dodgy. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
221e7610 |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: Preserve IRQ/NMI/SMI priority irrespective of exiting behavior Short circuit vmx_check_nested_events() if an unblocked IRQ/NMI/SMI is pending and needs to be injected into L2, priority between coincident events is not dependent on exiting behavior. Fixes: b518ba9fa691 ("KVM: nSVM: implement check_nested_events for interrupts") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
fc6f7c03 |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: Report interrupts as allowed when in L2 and exit-on-interrupt is set Report interrupts as allowed when the vCPU is in L2 and L2 is being run with exit-on-interrupts enabled and EFLAGS.IF=1 (either on the host or on the guest according to VINTR). Interrupts are always unblocked from L1's perspective in this case. While moving nested_exit_on_intr to svm.h, use INTERCEPT_INTR properly instead of assuming it's zero (which it is of course). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
55714cdd |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: Move SMI vmexit handling to svm_check_nested_events() Unlike VMX, SVM allows a hypervisor to take a SMI vmexit without having any special SMM-monitor enablement sequence. Therefore, it has to be handled like interrupts and NMIs. Check for an unblocked SMI in svm_check_nested_events() so that pending SMIs are correctly prioritized over IRQs and NMIs when the latter events will trigger VM-Exit. Note that there is no need to test explicitly for SMI vmexits, because guests always runs outside SMM and therefore can never get an SMI while they are blocked. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
bbdad0b5 |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: Report NMIs as allowed when in L2 and Exit-on-NMI is set Report NMIs as allowed when the vCPU is in L2 and L2 is being run with Exit-on-NMI enabled, as NMIs are always unblocked from L1's perspective in this case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
9c3d370a |
|
14-Apr-2020 |
Cathy Avery <cavery@redhat.com> |
KVM: SVM: Implement check_nested_events for NMI Migrate nested guest NMI intercept processing to new check_nested_events. Signed-off-by: Cathy Avery <cavery@redhat.com> Message-Id: <20200414201107.22952-2-cavery@redhat.com> [Reorder clauses as NMIs have higher priority than IRQs; inject immediate vmexit as is now done for IRQ vmexits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
6e085cbf |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: immediately inject INTR vmexit We can immediately leave SVM guest mode in svm_check_nested_events now that we have the nested_run_pending mechanism. This makes things easier because we can run the rest of inject_pending_event with GIF=0, and KVM will naturally end up requesting the next interrupt window. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
38c0b192 |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: leave halted state on vmexit Similar to VMX, we need to leave the halted state when performing a vmexit. Failure to do so will cause a hang after vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f74f9414 |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: introduce nested_run_pending We want to inject vmexits immediately from svm_check_nested_events, so that the interrupt/NMI window requests happen in inject_pending_event right after it returns. This however has the same issue as in vmx_check_nested_events, so introduce a nested_run_pending flag with the exact same purpose of delaying vmexit injection after the vmentry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
d67668e9 |
|
06-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
5679b803 |
|
04-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6 kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
2c19dba6 |
|
07-May-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
7c67f546 |
|
23-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: SVM: do not allow VMRUN inside SMM VMRUN is not supported inside the SMM handler and the behavior is undefined. Just raise a #UD. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
33b22172 |
|
17-Apr-2020 |
Paolo Bonzini <pbonzini@redhat.com> |
KVM: x86: move nested-related kvm_x86_ops to a separate struct Clean up some of the patching of kvm_x86_ops, by moving kvm_x86_ops related to nested virtualization into a separate struct. As a result, these ops will always be non-NULL on VMX. This is not a problem: * check_nested_events is only called if is_guest_mode(vcpu) returns true * get_nested_state treats VMXOFF state the same as nested being disabled * set_nested_state fails if you attempt to set nested state while nesting is disabled * nested_enable_evmcs could already be called on a CPU without VMX enabled in CPUID. * nested_get_evmcs_version was fixed in the previous patch Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
4f233371 |
|
09-Apr-2020 |
Krish Sadhukhan <krish.sadhukhan@oracle.com> |
KVM: nSVM: Check for CR0.CD and CR0.NW on VMRUN of nested guests According to section "Canonicalization and Consistency Checks" in APM vol. 2, the following guest state combination is illegal: "CR0.CD is zero and CR0.NW is set" Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Message-Id: <20200409205035.16830-2-krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
f55ac304 |
|
20-Mar-2020 |
Sean Christopherson <seanjc@google.com> |
KVM: x86: Drop @invalidate_gpa param from kvm_x86_ops' tlb_flush() Drop @invalidate_gpa from ->tlb_flush() and kvm_vcpu_flush_tlb() now that all callers pass %true for said param, or ignore the param (SVM has an internal call to svm_flush_tlb() in svm_flush_tlb_guest that somewhat arbitrarily passes %false). Remove __vmx_flush_tlb() as it is no longer used. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200320212833.3507-17-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
#
883b0a91 |
|
24-Mar-2020 |
Joerg Roedel <jroedel@suse.de> |
KVM: SVM: Move Nested SVM Implementation to nested.c Split out the code for the nested SVM implementation and move it to a separate file. Signed-off-by: Joerg Roedel <jroedel@suse.de> Message-Id: <20200324094154.32352-3-joro@8bytes.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|